Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-21338

Remove order by and limit for aggregates

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      If a query is guaranteed to produce at most one row LIMIT and ORDER BY could be removed. This saves unnecessary vertex for LIMIT/ORDER BY.

      explain select count(*) cs from store_sales where ss_ext_sales_price > 100.00 order by cs limit 100
      
      STAGE PLANS:
        Stage: Stage-1
          Tez
            DagId: vgarg_20190227131959_2914830f-eab6-425d-b9f0-b8cb56f8a1e9:4
            Edges:
              Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
              Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
            DagName: vgarg_20190227131959_2914830f-eab6-425d-b9f0-b8cb56f8a1e9:4
            Vertices:
              Map 1
                  Map Operator Tree:
                      TableScan
                        alias: store_sales
                        filterExpr: (ss_ext_sales_price > 100) (type: boolean)
                        Statistics: Num rows: 1 Data size: 112 Basic stats: COMPLETE Column stats: NONE
                        Filter Operator
                          predicate: (ss_ext_sales_price > 100) (type: boolean)
                          Statistics: Num rows: 1 Data size: 112 Basic stats: COMPLETE Column stats: NONE
                          Select Operator
                            Statistics: Num rows: 1 Data size: 112 Basic stats: COMPLETE Column stats: NONE
                            Group By Operator
                              aggregations: count()
                              mode: hash
                              outputColumnNames: _col0
                              Statistics: Num rows: 1 Data size: 120 Basic stats: COMPLETE Column stats: NONE
                              Reduce Output Operator
                                sort order:
                                Statistics: Num rows: 1 Data size: 120 Basic stats: COMPLETE Column stats: NONE
                                value expressions: _col0 (type: bigint)
                  Execution mode: vectorized
              Reducer 2
                  Execution mode: vectorized
                  Reduce Operator Tree:
                    Group By Operator
                      aggregations: count(VALUE._col0)
                      mode: mergepartial
                      outputColumnNames: _col0
                      Statistics: Num rows: 1 Data size: 120 Basic stats: COMPLETE Column stats: NONE
                      Reduce Output Operator
                        key expressions: _col0 (type: bigint)
                        sort order: +
                        Statistics: Num rows: 1 Data size: 120 Basic stats: COMPLETE Column stats: NONE
                        TopN Hash Memory Usage: 0.1
              Reducer 3
                  Execution mode: vectorized
                  Reduce Operator Tree:
                    Select Operator
                      expressions: KEY.reducesinkkey0 (type: bigint)
                      outputColumnNames: _col0
                      Statistics: Num rows: 1 Data size: 120 Basic stats: COMPLETE Column stats: NONE
                      Limit
                        Number of rows: 100
                        Statistics: Num rows: 1 Data size: 120 Basic stats: COMPLETE Column stats: NONE
                        File Output Operator
                          compressed: false
                          Statistics: Num rows: 1 Data size: 120 Basic stats: COMPLETE Column stats: NONE
                          table:
                              input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                              output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                              serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      
        Stage: Stage-0
          Fetch Operator
            limit: 100
            Processor Tree:
              ListSink
      

      Attachments

        1. HIVE-21338.1.patch
          15 kB
          Vineet Garg
        2. HIVE-21338.2.patch
          912 kB
          Vineet Garg
        3. HIVE-21338.3.patch
          903 kB
          Vineet Garg
        4. HIVE-21338.4.patch
          902 kB
          Vineet Garg
        5. HIVE-21338.5.patch
          899 kB
          Vineet Garg
        6. HIVE-21338.6.patch
          1.08 MB
          Vineet Garg
        7. HIVE-21338.7.patch
          1.11 MB
          Vineet Garg
        8. HIVE-21338.8.patch
          1.11 MB
          Vineet Garg

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            vgarg Vineet Garg Assign to me
            vgarg Vineet Garg
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - Not Specified
              Not Specified
              Remaining:
              Remaining Estimate - 0h
              0h
              Logged:
              Time Spent - 1h 50m
              1h 50m

              Slack

                Issue deployment