[HIVE-19607] Pushing Aggregates on Top of Aggregates - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Description

This plan shows an instance where the count aggregates can be pushed to Druid which will eliminate the last stage reducer.

+PREHOOK: query: EXPLAIN select count(DISTINCT cstring2), sum(cdouble) FROM druid_table
+PREHOOK: type: QUERY
+POSTHOOK: query: EXPLAIN select count(DISTINCT cstring2), sum(cdouble) FROM druid_table
+POSTHOOK: type: QUERY
+STAGE DEPENDENCIES:
+  Stage-1 is a root stage
+  Stage-0 depends on stages: Stage-1
+
+STAGE PLANS:
+  Stage: Stage-1
+    Tez
+#### A masked pattern was here ####
+      Edges:
+        Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
+#### A masked pattern was here ####
+      Vertices:
+        Map 1
+            Map Operator Tree:
+                TableScan
+                  alias: druid_table
+                  properties:
+                    druid.fieldNames cstring2,$f1
+                    druid.fieldTypes string,double
+                    druid.query.json {"queryType":"groupBy","dataSource":"default.druid_table","granularity":"all","dimensions":[{"type":"default","dimension":"cstring2","outputName":"cstring2","outputType":"STRING"}],"limitSpec":{"type":"default"},"aggregations":[{"type":"doubleSum","name":"$f1","fieldName":"cdouble"}],"intervals":["1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z"]}
+                    druid.query.type groupBy
+                  Statistics: Num rows: 9173 Data size: 1673472 Basic stats: COMPLETE Column stats: NONE
+                  Select Operator
+                    expressions: cstring2 (type: string), $f1 (type: double)
+                    outputColumnNames: cstring2, $f1
+                    Statistics: Num rows: 9173 Data size: 1673472 Basic stats: COMPLETE Column stats: NONE
+                    Group By Operator
+                      aggregations: count(cstring2), sum($f1)
+                      mode: hash
+                      outputColumnNames: _col0, _col1
+                      Statistics: Num rows: 1 Data size: 208 Basic stats: COMPLETE Column stats: NONE
+                      Reduce Output Operator
+                        sort order:
+                        Statistics: Num rows: 1 Data size: 208 Basic stats: COMPLETE Column stats: NONE
+                        value expressions: _col0 (type: bigint), _col1 (type: double)
+        Reducer 2
+            Reduce Operator Tree:
+              Group By Operator
+                aggregations: count(VALUE._col0), sum(VALUE._col1)
+                mode: mergepartial
+                outputColumnNames: _col0, _col1
+                Statistics: Num rows: 1 Data size: 208 Basic stats: COMPLETE Column stats: NONE
+                File Output Operator
+                  compressed: false
+                  Statistics: Num rows: 1 Data size: 208 Basic stats: COMPLETE Column stats: NONE
+                  table:
+                      input format: org.apache.hadoop.mapred.SequenceFileInputFormat
+                      output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
+                      serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
+

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Slim Bouguerra

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 18/May/18 19:32

Updated:: 21/Oct/22 07:20