Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-13164

Predicate pushdown may cause cross-product in left semi join

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Invalid
    • None
    • None
    • Query Processor
    • None

    Description

      For some left semi join queries like followings:
      select count(1) from (select value from t1 where key = 0) t1 left semi join (select value from t2 where key = 0) t2 on t2.value = 'val_0';
      or
      select count(1) from (select value from t1 where key = 0) t1 left semi join (select value from t2 where key = 0) t2 on t1.value = 'val_0';
      Their plans show that they have been converted to keyless cross-product due to the predicate pushdown and the dropping of the on condition.

      LOGICAL PLAN:
      t1:t1 
        TableScan (TS_0)
          alias: t1
          Statistics: Num rows: 1453 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
          Filter Operator (FIL_18)
            predicate: (key = 0) (type: boolean)
            Statistics: Num rows: 726 Data size: 2904 Basic stats: COMPLETE Column stats: NONE
            Select Operator (SEL_2)
              Statistics: Num rows: 726 Data size: 2904 Basic stats: COMPLETE Column stats: NONE
              Reduce Output Operator (RS_9)
                sort order: 
                Statistics: Num rows: 726 Data size: 2904 Basic stats: COMPLETE Column stats: NONE
                Join Operator (JOIN_11)
                  condition map:
                       Left Semi Join 0 to 1
                  keys:
                    0 
                    1 
                  Statistics: Num rows: 798 Data size: 3194 Basic stats: COMPLETE Column stats: NONE
                  Group By Operator (GBY_13)
                    aggregations: count(1)
                    mode: hash
                    outputColumnNames: _col0
                    Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
                    Reduce Output Operator (RS_14)
                      sort order: 
                      Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
                      value expressions: _col0 (type: bigint)
                      Group By Operator (GBY_15)
                        aggregations: count(VALUE._col0)
                        mode: mergepartial
                        outputColumnNames: _col0
                        Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
                        File Output Operator (FS_17)
                          compressed: false
                          Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
                          table:
                              input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                              output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                              serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      t2:t2 
        TableScan (TS_3)
          alias: t2
          Statistics: Num rows: 645 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
          Filter Operator (FIL_19)
            predicate: ((key = 0) and (value = 'val_0')) (type: boolean)
            Statistics: Num rows: 161 Data size: 1450 Basic stats: COMPLETE Column stats: NONE
            Select Operator (SEL_5)
              Statistics: Num rows: 161 Data size: 1450 Basic stats: COMPLETE Column stats: NONE
              Group By Operator (GBY_8)
                keys: 'val_0' (type: string)
                mode: hash
                outputColumnNames: _col0
                Statistics: Num rows: 161 Data size: 1450 Basic stats: COMPLETE Column stats: NONE
                Reduce Output Operator (RS_10)
                  sort order: 
                  Statistics: Num rows: 161 Data size: 1450 Basic stats: COMPLETE Column stats: NONE
                  Join Operator (JOIN_11)
                    condition map:
                         Left Semi Join 0 to 1
                    keys:
                      0 
                      1 
                    Statistics: Num rows: 798 Data size: 3194 Basic stats: COMPLETE Column stats: NONE
      

      gopalv, do you think these plans are valid or not? Thanks

      Attachments

        Issue Links

          Activity

            People

              ctang Chaoyu Tang
              ctang Chaoyu Tang
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: