[SPARK-45507] Correctness bug in correlated scalar subqueries with COUNT aggregates - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 4.0.0
Fix Version/s: 4.0.0
Component/s: SQL
Labels:
- correctness
- pull-request-available

Description

 
create view if not exists t1(a1, a2) as values (0, 1), (1, 2);
create view if not exists t2(b1, b2) as values (0, 2), (0, 3);
create view if not exists t3(c1, c2) as values (0, 2), (0, 3);

-- Example 1
select (
  select SUM(l.cnt + r.cnt)
  from (select count(*) cnt from t2 where t1.a1 = t2.b1 having cnt = 0) l
  join (select count(*) cnt from t3 where t1.a1 = t3.c1 having cnt = 0) r
  on l.cnt = r.cnt
) from t1

-- Correct answer: (null, 0)
+----------------------+
|scalarsubquery(c1, c1)|
+----------------------+
|null                  |
|null                  |
+----------------------+

-- Example 2
select ( select sum(cnt) from (select count(*) cnt from t2 where t1.c1 = t2.c1) ) from t1

-- Correct answer: (2, 0)
+------------------+
|scalarsubquery(c1)|
+------------------+
|2                 |
|null              |
+------------------+

-- Example 3
select ( select count(*) from (select count(*) cnt from t2 where t1.c1 = t2.c1) ) from t1

-- Correct answer: (1, 1)
+------------------+
|scalarsubquery(c1)|
+------------------+
|1                 |
|0                 |
+------------------+

DB fiddle for correctness check:https://www.db-fiddle.com/f/4jyoMCicNSZpjMt4jFYoz5/10403#

Attachments

Issue Links

links to

GitHub Pull Request #43341

Activity

People

Assignee:: Andy Lam

Reporter:: Andy Lam

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 12/Oct/23 00:41

Updated:: 24/Nov/23 22:50

Resolved:: 19/Oct/23 02:36