Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
Following query generate plan where right side of HiveSemiJoin contains HiveProject->HiveFilter->HiveProject where bottom HiveProject contain extra columns which can be pruned.
explain cbo with frequent_ss_items as (select substr(i_item_desc,1,30) itemdesc,i_item_sk item_sk,d_date solddate,count(*) cnt from store_sales ,date_dim ,item where ss_sold_date_sk = d_date_sk and ss_item_sk = i_item_sk and d_year in (1999,1999+1,1999+2,1999+3) group by substr(i_item_desc,1,30),i_item_sk,d_date having count(*) >4) select sum(sales) from ((select cs_quantity*cs_list_price sales from catalog_sales ,date_dim where d_year = 1999 and d_moy = 1 and cs_sold_date_sk = d_date_sk and cs_item_sk in (select item_sk from frequent_ss_items))) subq limit 100;
CBO Plan:
HiveSortLimit(fetch=[100]) HiveProject($f0=[$0]) HiveAggregate(group=[{}], agg#0=[sum($0)]) HiveProject(sales=[*(CAST($2):DECIMAL(10, 0), $3)]) HiveSemiJoin(condition=[=($1, $5)], joinType=[inner]) HiveJoin(condition=[=($0, $4)], joinType=[inner], algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}]) HiveProject(cs_sold_date_sk=[$0], cs_item_sk=[$15], cs_quantity=[$18], cs_list_price=[$20]) HiveFilter(condition=[IS NOT NULL($0)]) HiveTableScan(table=[[perf_constraints, catalog_sales]], table:alias=[catalog_sales]) HiveProject(d_date_sk=[$0]) HiveFilter(condition=[AND(=($6, 1999), =($8, 1))]) HiveTableScan(table=[[perf_constraints, date_dim]], table:alias=[date_dim]) HiveProject(i_item_sk=[$1]) HiveFilter(condition=[>($3, 4)]) HiveProject(substr=[$2], i_item_sk=[$1], d_date=[$0], $f3=[$3]) HiveAggregate(group=[{3, 4, 5}], agg#0=[count()]) HiveJoin(condition=[=($1, $4)], joinType=[inner], algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}]) HiveJoin(condition=[=($0, $2)], joinType=[inner], algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}]) HiveProject(ss_sold_date_sk=[$0], ss_item_sk=[$2]) HiveFilter(condition=[IS NOT NULL($0)]) HiveTableScan(table=[[perf_constraints, store_sales]], table:alias=[store_sales]) HiveProject(d_date_sk=[$0], d_date=[$2]) HiveFilter(condition=[IN($6, 1999, 2000, 2001, 2002)]) HiveTableScan(table=[[perf_constraints, date_dim]], table:alias=[date_dim]) HiveProject(i_item_sk=[$0], substr=[substr($4, 1, 30)]) HiveTableScan(table=[[perf_constraints, item]], table:alias=[item])
Only i_item_sk and $f3/count are used up in the plan therefore columns substr andn d_date can be removed.
Note that the above is generated with HIVE-21340 patch
Attachments
Issue Links
- is related to
-
HIVE-21340 CBO: Prune non-key columns feeding into a SemiJoin
- Closed