Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-20757

Autogather stats doesn't work when SDPO (sort dynamic partition optimization) is ON

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 4.0.0
    • None
    • Statistics
    • None

    Description

      Reproducer

      set hive.optimize.sort.dynamic.partition=true;
      set hive.exec.dynamic.partition.mode=nonstrict;
      set hive.stats.autogather=true;
      
      create table t11(i int, j int) partitioned by (s string);
      insert into t11 partition(s) values(3,4, 'p1'),(4,5, 'p2'),(6,9,'p3');
      
      hive> desc formatted t11 j;
      OK
      col_name            	j
      data_type           	int
      min
      max
      num_nulls
      distinct_count
      avg_col_len
      max_col_len
      num_trues
      num_falses
      bitVector
      comment             	from deserializer
      COLUMN_STATS_ACCURATE	{}
      
      hive> explain insert into t11 partition(s) values(3,4, 'p1'),(4,5, 'p2'),(6,9,'p3');
      
      STAGE PLANS:
        Stage: Stage-1
          Tez
            DagId: vgarg_20181016113701_f3aa9f8f-b38b-47a8-8149-b5521bf072f6:13
            Edges:
              Reducer 2 <- Map 1 (SIMPLE_EDGE)
            DagName: vgarg_20181016113701_f3aa9f8f-b38b-47a8-8149-b5521bf072f6:13
            Vertices:
              Map 1
                  Map Operator Tree:
                      TableScan
                        alias: _dummy_table
                        Row Limit Per Split: 1
                        Statistics: Num rows: 1 Data size: 10 Basic stats: COMPLETE Column stats: COMPLETE
                        Select Operator
                          expressions: array(const struct(3,4,'p1'),const struct(4,5,'p2'),const struct(6,9,'p3')) (type: array<struct<col1:int,col2:int,col3:string>>)
                          outputColumnNames: _col0
                          Statistics: Num rows: 1 Data size: 64 Basic stats: COMPLETE Column stats: COMPLETE
                          UDTF Operator
                            Statistics: Num rows: 1 Data size: 64 Basic stats: COMPLETE Column stats: COMPLETE
                            function name: inline
                            Select Operator
                              expressions: col1 (type: int), col2 (type: int), col3 (type: string)
                              outputColumnNames: _col0, _col1, _col2
                              Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                              Reduce Output Operator
                                key expressions: _col2 (type: string)
                                sort order: +
                                Map-reduce partition columns: _col2 (type: string)
                                Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                                value expressions: _col0 (type: int), _col1 (type: int)
              Reducer 2
                  Execution mode: vectorized
                  Reduce Operator Tree:
                    Select Operator
                      expressions: VALUE._col0 (type: int), VALUE._col1 (type: int), KEY._col2 (type: string)
                      outputColumnNames: _col0, _col1, _col2
                      Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                      File Output Operator
                        compressed: false
                        Dp Sort State: PARTITION_SORTED
                        Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                        table:
                            input format: org.apache.hadoop.mapred.TextInputFormat
                            output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                            serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                            name: default.t11
      
        Stage: Stage-2
          Dependency Collection
      
        Stage: Stage-0
          Move Operator
            tables:
                partition:
                  s
                replace: false
                table:
                    input format: org.apache.hadoop.mapred.TextInputFormat
                    output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                    serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                    name: default.t11
      
        Stage: Stage-3
          Stats Work
            Basic Stats Work:
            Column Stats Desc:
                Columns: i, j
                Column Types: int, int
                Table: default.t11
      

      Notice that explain plan has autogather stats branch missing

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              vgarg Vineet Garg
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: