Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
2.3.7, 3.1.2, 4.0.0
Description
create table test(id int); explain extended select id,count(*) from test group by id limit 10;
There is an TopN unexpectly for map phase, which casues incorrect result.
STAGE PLANS: Stage: Stage-1 Tez DagId: root_20210104141527_c599c0cd-ca2f-4c7d-a3cc-3a01d65c49a1:5 Edges: Reducer 2 <- Map 1 (SIMPLE_EDGE) DagName: root_20210104141527_c599c0cd-ca2f-4c7d-a3cc-3a01d65c49a1:5 Vertices: Map 1 Map Operator Tree: TableScan alias: test Statistics: Num rows: 1 Data size: 13500 Basic stats: COMPLETE Column stats: NONE GatherStats: false Select Operator expressions: id (type: int) outputColumnNames: id Statistics: Num rows: 1 Data size: 13500 Basic stats: COMPLETE Column stats: NONE Group By Operator aggregations: count() keys: id (type: int) mode: hash outputColumnNames: _col0, _col1 Statistics: Num rows: 1 Data size: 13500 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: int) null sort order: a sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 1 Data size: 13500 Basic stats: COMPLETE Column stats: NONE tag: -1 TopN: 10 TopN Hash Memory Usage: 0.1 value expressions: _col1 (type: bigint) auto parallelism: true Execution mode: vectorized Path -> Alias: file:/user/hive/warehouse/test [test] Path -> Partition: file:/user/hive/warehouse/test Partition base file name: test input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat properties: COLUMN_STATS_ACCURATE {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true"}} bucket_count -1 bucketing_version 2 column.name.delimiter , columns id columns.comments columns.types int file.inputformat org.apache.hadoop.mapred.TextInputFormat file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat location file:/user/hive/warehouse/test name default.test numFiles 0 numRows 0 rawDataSize 0 serialization.ddl struct test { i32 id} serialization.format 1 serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe totalSize 0 transient_lastDdlTime 1609730190 serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat properties: COLUMN_STATS_ACCURATE {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true"}} bucket_count -1 bucketing_version 2 column.name.delimiter , columns id columns.comments columns.types int file.inputformat org.apache.hadoop.mapred.TextInputFormat file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat location file:/user/hive/warehouse/test name default.test numFiles 0 numRows 0 rawDataSize 0 serialization.ddl struct test { i32 id} serialization.format 1 serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe totalSize 0 transient_lastDdlTime 1609730190 serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: default.test name: default.test Truncated Path -> Alias: /test [test] Reducer 2 Execution mode: vectorized Needs Tagging: false Reduce Operator Tree: Group By Operator aggregations: count(VALUE._col0) keys: KEY._col0 (type: int) mode: mergepartial outputColumnNames: _col0, _col1 Statistics: Num rows: 1 Data size: 13500 Basic stats: COMPLETE Column stats: NONE Limit Number of rows: 10 Statistics: Num rows: 1 Data size: 13500 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false GlobalTableId: 0 directory: file:/tmp/root/7160ea24-52b9-47c3-aafc-c9200263a1c6/hive_2021-01-04_14-15-27_601_190083924675700904-1/-mr-10001/.hive-staging_hive_2021-01-04_14-15-27_601_190083924675700904-1/-ext-10002 NumFilesPerFileSink: 1 Statistics: Num rows: 1 Data size: 13500 Basic stats: COMPLETE Column stats: NONE Stats Publishing Key Prefix: file:/tmp/root/7160ea24-52b9-47c3-aafc-c9200263a1c6/hive_2021-01-04_14-15-27_601_190083924675700904-1/-mr-10001/.hive-staging_hive_2021-01-04_14-15-27_601_190083924675700904-1/-ext-10002/ table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat properties: columns _col0,_col1 columns.types int:bigint escape.delim \ hive.serialization.extend.additional.nesting.levels true serialization.escape.crlf true serialization.format 1 serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe TotalFiles: 1 GatherStats: false MultiFileSpray: false Stage: Stage-0 Fetch Operator limit: 10 Processor Tree: ListSink Time taken: 0.102 seconds, Fetched: 143 row(s)
Attachments
Issue Links
- is related to
-
HIVE-25856 Intermittent null ordering in plans of queries with GROUP BY and LIMIT
- Closed
- links to