[HIVE-20757] Autogather stats doesn't work when SDPO (sort dynamic partition optimization) is ON - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: 4.0.0
Fix Version/s: None
Component/s: Statistics
Labels:
None

Description

Reproducer

set hive.optimize.sort.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.stats.autogather=true;

create table t11(i int, j int) partitioned by (s string);
insert into t11 partition(s) values(3,4, 'p1'),(4,5, 'p2'),(6,9,'p3');

hive> desc formatted t11 j;
OK
col_name            	j
data_type           	int
min
max
num_nulls
distinct_count
avg_col_len
max_col_len
num_trues
num_falses
bitVector
comment             	from deserializer
COLUMN_STATS_ACCURATE	{}

hive> explain insert into t11 partition(s) values(3,4, 'p1'),(4,5, 'p2'),(6,9,'p3');

STAGE PLANS:
  Stage: Stage-1
    Tez
      DagId: vgarg_20181016113701_f3aa9f8f-b38b-47a8-8149-b5521bf072f6:13
      Edges:
        Reducer 2 <- Map 1 (SIMPLE_EDGE)
      DagName: vgarg_20181016113701_f3aa9f8f-b38b-47a8-8149-b5521bf072f6:13
      Vertices:
        Map 1
            Map Operator Tree:
                TableScan
                  alias: _dummy_table
                  Row Limit Per Split: 1
                  Statistics: Num rows: 1 Data size: 10 Basic stats: COMPLETE Column stats: COMPLETE
                  Select Operator
                    expressions: array(const struct(3,4,'p1'),const struct(4,5,'p2'),const struct(6,9,'p3')) (type: array<struct<col1:int,col2:int,col3:string>>)
                    outputColumnNames: _col0
                    Statistics: Num rows: 1 Data size: 64 Basic stats: COMPLETE Column stats: COMPLETE
                    UDTF Operator
                      Statistics: Num rows: 1 Data size: 64 Basic stats: COMPLETE Column stats: COMPLETE
                      function name: inline
                      Select Operator
                        expressions: col1 (type: int), col2 (type: int), col3 (type: string)
                        outputColumnNames: _col0, _col1, _col2
                        Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                        Reduce Output Operator
                          key expressions: _col2 (type: string)
                          sort order: +
                          Map-reduce partition columns: _col2 (type: string)
                          Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                          value expressions: _col0 (type: int), _col1 (type: int)
        Reducer 2
            Execution mode: vectorized
            Reduce Operator Tree:
              Select Operator
                expressions: VALUE._col0 (type: int), VALUE._col1 (type: int), KEY._col2 (type: string)
                outputColumnNames: _col0, _col1, _col2
                Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                File Output Operator
                  compressed: false
                  Dp Sort State: PARTITION_SORTED
                  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                  table:
                      input format: org.apache.hadoop.mapred.TextInputFormat
                      output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                      serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                      name: default.t11

  Stage: Stage-2
    Dependency Collection

  Stage: Stage-0
    Move Operator
      tables:
          partition:
            s
          replace: false
          table:
              input format: org.apache.hadoop.mapred.TextInputFormat
              output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
              serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
              name: default.t11

  Stage: Stage-3
    Stats Work
      Basic Stats Work:
      Column Stats Desc:
          Columns: i, j
          Column Types: int, int
          Table: default.t11

Notice that explain plan has autogather stats branch missing

Attachments

Issue Links

is duplicated by

HIVE-16100 Dynamic Sorted Partition optimizer loses sibling operators

Closed

Activity

People

Assignee:: Unassigned

Reporter:: Vineet Garg

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 16/Oct/18 18:38

Updated:: 03/Dec/18 22:01

Resolved:: 03/Dec/18 22:01