Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-28080

Propagate statistics from a source table to the materialized CTE

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      Hive doesn't fill in the statistics of materialized CTEs, and the size of those TableScans is underestimated. That causes Tez to run with fewer tasks or to fail with OOM because MapJoin could be wrongly applied.

       

      The following example shows Map 1 reading `src` generates 493 rows, but Map 3 reading `cte` is expected to scan only 1 row.

      0: jdbc:hive2://hive-hiveserver2:10000/defaul> EXPLAIN WITH cte AS (
      . . . . . . . . . . . . . . . . . . . . . . .>   SELECT * FROM src
      . . . . . . . . . . . . . . . . . . . . . . .> )
      . . . . . . . . . . . . . . . . . . . . . . .> SELECT *
      . . . . . . . . . . . . . . . . . . . . . . .> FROM cte a
      . . . . . . . . . . . . . . . . . . . . . . .> JOIN cte b ON (a.key = b.key)
      . . . . . . . . . . . . . . . . . . . . . . .> JOIN cte c ON (a.key = c.key);
      ...
      +----------------------------------------------------+
      |                      Explain                       |
      +----------------------------------------------------+
      | Plan optimized by CBO.                             |
      |                                                    |
      | Vertex dependency in Stage-4                       |
      | Map 2 <- Map 3 (BROADCAST_EDGE), Reducer 4 (BROADCAST_EDGE) |
      | Reducer 4 <- Map 3 (SIMPLE_EDGE)                   |
      |                                                    |
      | Stage-3                                            |
      |   Fetch Operator                                   |
      |     limit:-1                                       |
      |     Stage-4                                        |
      |       Map 2 vectorized                             |
      |       File Output Operator [FS_69]                 |
      |         Map Join Operator [MAPJOIN_68] (rows=1 width=444) |
      |           Conds:MAPJOIN_67._col0=RS_61._col0(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5"] |
      |         <-Map 3 [BROADCAST_EDGE] vectorized        |
      |           SHUFFLE [RS_61]                          |
      |             PartitionCols:_col0                    |
      |             Select Operator [SEL_60] (rows=1 width=368) |
      |               Output:["_col0","_col1"]             |
      |               Filter Operator [FIL_59] (rows=1 width=368) |
      |                 predicate:key is not null          |
      |                 TableScan [TS_11] (rows=1 width=368) |
      |                   default@cte,c,Tbl:COMPLETE,Col:NONE,Output:["key","value"] |
      |         <-Map Join Operator [MAPJOIN_67] (rows=1 width=404) |
      |             Conds:SEL_66._col0=RS_64._col0(Inner),Output:["_col0","_col1","_col2","_col3"] |
      |           <-Reducer 4 [BROADCAST_EDGE] vectorized  |
      |             BROADCAST [RS_64]                      |
      |               PartitionCols:_col0                  |
      |               Select Operator [SEL_63]             |
      |                 Output:["_col0","_col1"]           |
      |           <-Select Operator [SEL_66] (rows=1 width=368) |
      |               Output:["_col0","_col1"]             |
      |               Filter Operator [FIL_65] (rows=1 width=368) |
      |                 predicate:key is not null          |
      |                 TableScan [TS_5] (rows=1 width=368) |
      |                   default@cte,a,Tbl:COMPLETE,Col:NONE,Output:["key","value"] |
      |         Stage-2                                    |
      |           Dependency Collection{}                  |
      |             Stage-1                                |
      |               Map 1 vectorized                     |
      |               File Output Operator [FS_4]          |
      |                 table:{"name:":"default.cte"}      |
      |                 Select Operator [SEL_3] (rows=493 width=350) |
      |                   Output:["_col0","_col1"]         |
      |                   TableScan [TS_0] (rows=493 width=350) |
      |                     default@src,src,Tbl:COMPLETE,Col:NONE,Output:["key","value"] |
      |         Stage-0                                    |
      |           Move Operator                            |
      |              Please refer to the previous Stage-1  |
      |                                                    |
      +----------------------------------------------------+ 

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            okumin Shohei Okumiya Assign to me
            okumin Shohei Okumiya
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment