[HIVE-21338] Remove order by and limit for aggregates - ASF JIRA

Log work

Agile Board

Rank to Top

Rank to Bottom

Bulk Copy Attachments

Bulk Move Attachments

Voters

Watch issue

Watchers

Create sub-task

Convert to sub-task

Move

Link

Clone

Labels

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 4.0.0-alpha-1
Component/s: Query Planning
Labels:
- pull-request-available

Description

If a query is guaranteed to produce at most one row LIMIT and ORDER BY could be removed. This saves unnecessary vertex for LIMIT/ORDER BY.

explain select count(*) cs from store_sales where ss_ext_sales_price > 100.00 order by cs limit 100

STAGE PLANS:
  Stage: Stage-1
    Tez
      DagId: vgarg_20190227131959_2914830f-eab6-425d-b9f0-b8cb56f8a1e9:4
      Edges:
        Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
        Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
      DagName: vgarg_20190227131959_2914830f-eab6-425d-b9f0-b8cb56f8a1e9:4
      Vertices:
        Map 1
            Map Operator Tree:
                TableScan
                  alias: store_sales
                  filterExpr: (ss_ext_sales_price > 100) (type: boolean)
                  Statistics: Num rows: 1 Data size: 112 Basic stats: COMPLETE Column stats: NONE
                  Filter Operator
                    predicate: (ss_ext_sales_price > 100) (type: boolean)
                    Statistics: Num rows: 1 Data size: 112 Basic stats: COMPLETE Column stats: NONE
                    Select Operator
                      Statistics: Num rows: 1 Data size: 112 Basic stats: COMPLETE Column stats: NONE
                      Group By Operator
                        aggregations: count()
                        mode: hash
                        outputColumnNames: _col0
                        Statistics: Num rows: 1 Data size: 120 Basic stats: COMPLETE Column stats: NONE
                        Reduce Output Operator
                          sort order:
                          Statistics: Num rows: 1 Data size: 120 Basic stats: COMPLETE Column stats: NONE
                          value expressions: _col0 (type: bigint)
            Execution mode: vectorized
        Reducer 2
            Execution mode: vectorized
            Reduce Operator Tree:
              Group By Operator
                aggregations: count(VALUE._col0)
                mode: mergepartial
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 120 Basic stats: COMPLETE Column stats: NONE
                Reduce Output Operator
                  key expressions: _col0 (type: bigint)
                  sort order: +
                  Statistics: Num rows: 1 Data size: 120 Basic stats: COMPLETE Column stats: NONE
                  TopN Hash Memory Usage: 0.1
        Reducer 3
            Execution mode: vectorized
            Reduce Operator Tree:
              Select Operator
                expressions: KEY.reducesinkkey0 (type: bigint)
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 120 Basic stats: COMPLETE Column stats: NONE
                Limit
                  Number of rows: 100
                  Statistics: Num rows: 1 Data size: 120 Basic stats: COMPLETE Column stats: NONE
                  File Output Operator
                    compressed: false
                    Statistics: Num rows: 1 Data size: 120 Basic stats: COMPLETE Column stats: NONE
                    table:
                        input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                        output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                        serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: 100
      Processor Tree:
        ListSink