Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-25170

Data error in constant propagation caused by wrong colExprMap generated in SemanticAnalyzer

    XMLWordPrintableJSON

Details

    Description

       

      
      SET hive.remove.orderby.in.subquery=false;
      
      EXPLAIN
      SELECT constant_col, key, max(value)
      FROM
      (
        SELECT 'constant' as constant_col, key, value
        FROM src
        DISTRIBUTE BY constant_col, key
        SORT BY constant_col, key, value
      ) a
      GROUP BY constant_col, key
      LIMIT 10;
      
      OK
      Vertex dependency in root stage
      Reducer 2 <- Map 1 (SIMPLE_EDGE)
      Reducer 3 <- Reducer 2 (SIMPLE_EDGE)Stage-0
        Fetch Operator
          limit:10
          Stage-1
            Reducer 3
            File Output Operator [FS_10]
              Limit [LIM_9] (rows=1 width=368)
                Number of rows:10
                Select Operator [SEL_8] (rows=1 width=368)
                  Output:["_col0","_col1","_col2"]
                  Group By Operator [GBY_7] (rows=1 width=368)
                    Output:["_col0","_col1","_col2"],aggregations:["max(VALUE._col0)"],keys:'constant', 'constant'
                  <-Reducer 2 [SIMPLE_EDGE]
                    SHUFFLE [RS_6]
                      PartitionCols:'constant', 'constant'
                      Group By Operator [GBY_5] (rows=1 width=368)
                        Output:["_col0","_col1","_col2"],aggregations:["max(_col2)"],keys:'constant', 'constant'
                        Select Operator [SEL_3] (rows=500 width=178)
                          Output:["_col2"]
                        <-Map 1 [SIMPLE_EDGE]
                          SHUFFLE [RS_2]
                            PartitionCols:'constant', _col1
                            Select Operator [SEL_1] (rows=500 width=178)
                              Output:["_col1","_col2"]
                              TableScan [TS_0] (rows=500 width=10)
                                src,src,Tbl:COMPLETE,Col:COMPLETE,Output:["key","value"]

      Obviously, the PartitionCols in Reducer 2 is wrong. Instead of 'constant', 'constant', it should be 'constant', _col1

       

      That's because after HIVE-13808,  SemanticAnalyzer uses sortCols to generate the colExprMap structure in the key part, while the key columns are generated by newSortCols, leading to a column and expr mismatch when the constant column is not the trailing column in the key columns.

      Constant propagation optimizer uses this colExprMap and finds extra const expression in the mismatched map, resulting in this error.

       

      In fact, colExprMap is used by multiple optimizers, which makes this quite a serious problem.

      Attachments

        Issue Links

          Activity

            People

              zhangweilst Wei Zhang
              zhangweilst Wei Zhang
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h