[SPARK-45583] Spark SQL returning incorrect values for full outer join on keys with the same name. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.4.1
Fix Version/s: 3.5.0
Component/s: SQL
Labels:
None

Description

The following query gives the wrong results.

WITH people as (
SELECT * FROM (VALUES
(1, 'Peter'),
(2, 'Homer'),
(3, 'Ned'),
(3, 'Jenny')
) AS Idiots(id, FirstName)
), location as (
SELECT * FROM (VALUES
(1, 'sample0'),
(1, 'sample1'),
(2, 'sample2')
) as Locations(id, address)
)SELECT
*
FROM
people
FULL OUTER JOIN
location
ON
people.id = location.id

We find the following table:

id: integer	FirstName: string	id: integer	address: string
2	Homer	2	sample2
null	Ned	null	null
null	Jenny	null	null
1	Peter	1	sample0
1	Peter	1	sample1

But clearly the first `id` column is wrong, the nulls should be 3.

If we rename the id column in (only) the person table to pid we get the correct results:

pid: integer	FirstName: string	id: integer	address: string
2	Homer	2	sample2
3	Ned	null	null
3	Jenny	null	null
1	Peter	1	sample0
1	Peter	1	sample1

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Huw

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 18/Oct/23 03:38

Updated:: 20/Oct/23 15:03

Resolved:: 20/Oct/23 15:03