Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-39429

Convert Inner Join With Aggregate To Semi Join

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: In Progress
    • Minor
    • Resolution: Unresolved
    • 3.2.1
    • None
    • Optimizer
    • None

    Description

      Inner Join with aggregation on one of its child it equivalent to a Semi Join with the aggregation eliminated If the aggregation is just doing grouping and all the grouping columns appears in the join condition.

      For example, Inner Join like below

       

      select * from table1 inner join (select key2 from table2 group by key2) subquery on table1.key1 = subquery.key2

      can be transformed to a semi join and eliminate the aggregate in the subquery

       

       

      select * from table1 semi join (select key2 from table2) subquery on table1.key1 = subquery.key2
      

       

      I think this can be added as a optimize rule to save the unnecessary aggregation in the subquery. 

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            lxian2 Li Xian
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: