[SPARK-43098] Should not handle the COUNT bug when the GROUP BY clause of a correlated scalar subquery is non-empty - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.2.0
Fix Version/s: 3.4.1, 3.5.0
Component/s: SQL
Labels:
- correctness

Description

There is no COUNT bug when the correlated equality predicates are also in the group by clause. However, the current logic to handle the COUNT bug still adds default aggregate function value and returns incorrect results.

create view t1(c1, c2) as values (0, 1), (1, 2);
create view t2(c1, c2) as values (0, 2), (0, 3);

select c1, c2, (select count(*) from t2 where t1.c1 = t2.c1 group by c1) from t1;

-- Correct answer: [(0, 1, 2), (1, 2, null)]
+---+---+------------------+
|c1 |c2 |scalarsubquery(c1)|
+---+---+------------------+
|0  |1  |2                 |
|1  |2  |0                 |
+---+---+------------------+

This bug affects scalar subqueries in RewriteCorrelatedScalarSubquery, but lateral subqueries handle it correctly in DecorrelateInnerQuery. Related: https://issues.apache.org/jira/browse/SPARK-36113

Attachments

Issue Links

relates to

SPARK-36113 Unify the logic to handle COUNT bug for scalar and lateral subqueries

Open

links to

[Github] Pull Request #40946 (jchen5)

Activity

People

Assignee:: Jack Chen

Reporter:: Jack Chen

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 11/Apr/23 19:38

Updated:: 24/Nov/23 22:18

Resolved:: 19/Apr/23 01:42