Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-12032

Filter can't be pushed down to correct Join because of bad order of Join

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • 2.0.0
    • SQL
    • None

    Description

      For this query:

        select d.d_year, count(*) cnt
         FROM store_sales, date_dim d, customer c
         WHERE ss_customer_sk = c.c_customer_sk AND c.c_first_shipto_date_sk = d.d_date_sk
         group by d.d_year
      

      Current optimized plan is

      == Optimized Logical Plan ==
      Aggregate [d_year#147], [d_year#147,(count(1),mode=Complete,isDistinct=false) AS cnt#425L]
       Project [d_year#147]
        Join Inner, Some(((ss_customer_sk#283 = c_customer_sk#101) && (c_first_shipto_date_sk#106 = d_date_sk#141)))
         Project [d_date_sk#141,d_year#147,ss_customer_sk#283]
          Join Inner, None
           Project [ss_customer_sk#283]
            Relation[] ParquetRelation[store_sales]
           Project [d_date_sk#141,d_year#147]
            Relation[] ParquetRelation[date_dim]
         Project [c_customer_sk#101,c_first_shipto_date_sk#106]
          Relation[] ParquetRelation[customer]
      

      It will join store_sales and date_dim together without any condition, the condition c.c_first_shipto_date_sk = d.d_date_sk is not pushed to it because the bad order of joins.

      The optimizer should re-order the joins, join date_dim after customer, then it can pushed down the condition correctly.

      The plan should be

      Aggregate [d_year#147], [d_year#147,(count(1),mode=Complete,isDistinct=false) AS cnt#425L]
       Project [d_year#147]
        Join Inner, Some((c_first_shipto_date_sk#106 = d_date_sk#141))
         Project [c_first_shipto_date_sk#106]
          Join Inner, Some((ss_customer_sk#283 = c_customer_sk#101))
           Project [ss_customer_sk#283]
            Relation[store_sales]
           Project [c_first_shipto_date_sk#106,c_customer_sk#101]
            Relation[customer]
         Project [d_year#147,d_date_sk#141]
          Relation[date_dim]
      

      Attachments

        Issue Links

          Activity

            People

              davies Davies Liu
              davies Davies Liu
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: