Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-9132 CBO: Calcite Operator To Hive Operator (Calcite Return Path)
  3. HIVE-14442

CBO: Calcite Operator To Hive Operator(Calcite Return Path): Wrong result/plan in group by with hive.map.aggr=false

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.3.0
    • CBO
    • None

    Description

      Reproducer

       set hive.cbo.returnpath.hiveop=true
       set hive.map.aggr=false
      
      create table abcd (a int, b int, c int, d int);
      LOAD DATA LOCAL INPATH '../../data/files/in4.txt' INTO TABLE abcd;
      
       explain select count(distinct a) from abcd group by b; 
      STAGE PLANS:
        Stage: Stage-1
          Map Reduce
            Map Operator Tree:
                TableScan
                  alias: abcd
                  Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE Column stats: NONE
                  Select Operator
                    expressions: a (type: int)
                    outputColumnNames: a
                    Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE Column stats: NONE
                    Reduce Output Operator
                      key expressions: a (type: int), a (type: int)
                      sort order: ++
                      Map-reduce partition columns: a (type: int)
                      Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE Column stats: NONE
            Reduce Operator Tree:
              Group By Operator
                aggregations: count(DISTINCT KEY._col1:0._col0)
                keys: KEY._col0 (type: int)
                mode: complete
                outputColumnNames: b, $f1
                Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column stats: NONE
                Select Operator
                  expressions: $f1 (type: bigint)
                  outputColumnNames: _o__c0
                  Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column stats: NONE
                  File Output Operator
                    compressed: false
                    Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column stats: NONE
                    table:
                        input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                        output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                        serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      
       explain select count(distinct a) from abcd group by c; 
      STAGE PLANS:
        Stage: Stage-1
          Map Reduce
            Map Operator Tree:
                TableScan
                  alias: abcd
                  Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE Column stats: NONE
                  Select Operator
                    expressions: a (type: int)
                    outputColumnNames: a
                    Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE Column stats: NONE
                    Reduce Output Operator
                      key expressions: a (type: int), a (type: int)
                      sort order: ++
                      Map-reduce partition columns: a (type: int)
                      Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE Column stats: NONE
            Reduce Operator Tree:
              Group By Operator
                aggregations: count(DISTINCT KEY._col1:0._col0)
                keys: KEY._col0 (type: int)
                mode: complete
                outputColumnNames: c, $f1
                Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column stats: NONE
                Select Operator
                  expressions: $f1 (type: bigint)
                  outputColumnNames: _o__c0
                  Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column stats: NONE
                  File Output Operator
                    compressed: false
                    Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column stats: NONE
                    table:
                        input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                        output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                        serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      
      

      Above two cases has wrong keys in Map side Reduce Output Operator (both has a, a instead of b,a and c,a respectively

      Attachments

        1. HIVE-14442.1.patch
          44 kB
          Vineet Garg
        2. HIVE-14442.2.patch
          26 kB
          Vineet Garg
        3. HIVE-14442.3.patch
          39 kB
          Vineet Garg

        Activity

          People

            vgarg Vineet Garg
            vgarg Vineet Garg
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: