Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      This plan shows an instance where the count aggregates can be pushed to Druid which will eliminate the last stage reducer.

      +PREHOOK: query: EXPLAIN select count(DISTINCT cstring2), sum(cdouble) FROM druid_table
      +PREHOOK: type: QUERY
      +POSTHOOK: query: EXPLAIN select count(DISTINCT cstring2), sum(cdouble) FROM druid_table
      +POSTHOOK: type: QUERY
      +STAGE DEPENDENCIES:
      +  Stage-1 is a root stage
      +  Stage-0 depends on stages: Stage-1
      +
      +STAGE PLANS:
      +  Stage: Stage-1
      +    Tez
      +#### A masked pattern was here ####
      +      Edges:
      +        Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
      +#### A masked pattern was here ####
      +      Vertices:
      +        Map 1
      +            Map Operator Tree:
      +                TableScan
      +                  alias: druid_table
      +                  properties:
      +                    druid.fieldNames cstring2,$f1
      +                    druid.fieldTypes string,double
      +                    druid.query.json {"queryType":"groupBy","dataSource":"default.druid_table","granularity":"all","dimensions":[{"type":"default","dimension":"cstring2","outputName":"cstring2","outputType":"STRING"}],"limitSpec":{"type":"default"},"aggregations":[{"type":"doubleSum","name":"$f1","fieldName":"cdouble"}],"intervals":["1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z"]}
      +                    druid.query.type groupBy
      +                  Statistics: Num rows: 9173 Data size: 1673472 Basic stats: COMPLETE Column stats: NONE
      +                  Select Operator
      +                    expressions: cstring2 (type: string), $f1 (type: double)
      +                    outputColumnNames: cstring2, $f1
      +                    Statistics: Num rows: 9173 Data size: 1673472 Basic stats: COMPLETE Column stats: NONE
      +                    Group By Operator
      +                      aggregations: count(cstring2), sum($f1)
      +                      mode: hash
      +                      outputColumnNames: _col0, _col1
      +                      Statistics: Num rows: 1 Data size: 208 Basic stats: COMPLETE Column stats: NONE
      +                      Reduce Output Operator
      +                        sort order:
      +                        Statistics: Num rows: 1 Data size: 208 Basic stats: COMPLETE Column stats: NONE
      +                        value expressions: _col0 (type: bigint), _col1 (type: double)
      +        Reducer 2
      +            Reduce Operator Tree:
      +              Group By Operator
      +                aggregations: count(VALUE._col0), sum(VALUE._col1)
      +                mode: mergepartial
      +                outputColumnNames: _col0, _col1
      +                Statistics: Num rows: 1 Data size: 208 Basic stats: COMPLETE Column stats: NONE
      +                File Output Operator
      +                  compressed: false
      +                  Statistics: Num rows: 1 Data size: 208 Basic stats: COMPLETE Column stats: NONE
      +                  table:
      +                      input format: org.apache.hadoop.mapred.SequenceFileInputFormat
      +                      output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
      +                      serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      +
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            bslim Slim Bouguerra
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: