Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-35553 Improve correlated subqueries
  3. SPARK-43098

Should not handle the COUNT bug when the GROUP BY clause of a correlated scalar subquery is non-empty

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.2.0
    • 3.4.1, 3.5.0
    • SQL

    Description

      From allisonwang-db :

      There is no COUNT bug when the correlated equality predicates are also in the group by clause. However, the current logic to handle the COUNT bug still adds default aggregate function value and returns incorrect results.

       

      create view t1(c1, c2) as values (0, 1), (1, 2);
      create view t2(c1, c2) as values (0, 2), (0, 3);
      
      select c1, c2, (select count(*) from t2 where t1.c1 = t2.c1 group by c1) from t1;
      
      -- Correct answer: [(0, 1, 2), (1, 2, null)]
      +---+---+------------------+
      |c1 |c2 |scalarsubquery(c1)|
      +---+---+------------------+
      |0  |1  |2                 |
      |1  |2  |0                 |
      +---+---+------------------+
       

       

      This bug affects scalar subqueries in RewriteCorrelatedScalarSubquery, but lateral subqueries handle it correctly in DecorrelateInnerQuery. Related: https://issues.apache.org/jira/browse/SPARK-36113 

       

      Attachments

        Issue Links

          Activity

            People

              jchen5 Jack Chen
              jchen5 Jack Chen
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: