Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-21340

CBO: Prune non-key columns feeding into a SemiJoin

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 4.0.0
    • 4.0.0-alpha-1
    • CBO, Query Planning
    • None

    Description

      explain cbo 
      with ss as 
      (select count(1), ss_item_sk, ss_ticket_number from 
                  store_sales group by ss_item_sk, ss_ticket_number 
                  having count(1) > 1) 
      select count(1) from item where i_item_sk IN (select ss_item_sk from ss);
      

      Notice the HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2])

      Only ss_item_sk is relevant for the HiveSemiJoin

      CBO PLAN:
      HiveAggregate(group=[{}], agg#0=[count()])
        HiveSemiJoin(condition=[=($0, $1)], joinType=[inner])
          HiveProject(i_item_sk=[$0])
            HiveFilter(condition=[IS NOT NULL($0)])
              HiveTableScan(table=[[tpcds_copy_orc_partitioned_10000, item]], table:alias=[item])
          HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2])
            HiveFilter(condition=[>($2, 1)])
              HiveAggregate(group=[{1, 8}], agg#0=[count()])
                HiveFilter(condition=[IS NOT NULL($1)])
                  HiveTableScan(table=[[tpcds_copy_orc_partitioned_10000, store_sales]], table:alias=[store_sales])
      

      Attachments

        1. HIVE-21340.1.patch
          8 kB
          Vineet Garg
        2. HIVE-21340.2.patch
          23 kB
          Vineet Garg
        3. HIVE-21340.3.patch
          27 kB
          Vineet Garg

        Issue Links

          Activity

            People

              vgarg Vineet Garg
              gopalv Gopal Vijayaraghavan
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: