[SPARK-46794] Incorrect results due to inferred predicate from checkpoint with subquery - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.5.0
Fix Version/s: 4.0.0, 3.5.1, 3.4.3
Component/s: SQL
Labels:
- correctness
- pull-request-available

Description

Spark can produce incorrect results when using a checkpointed DataFrame with a filter containing a scalar subquery. This subquery is included in the constraints of the resulting LogicalRDD, and may then be propagated as a filter when joining with the checkpointed DataFrame. This causes the subquery to be evaluated twice: once during checkpointing and once while evaluating the query. These two subquery evaluations may return different results, e.g. when the subquery contains a limit with an underspecified sort order.

Attachments

Issue Links

links to

GitHub Pull Request #44833

Activity

People

Assignee:: Tom van Bussel

Reporter:: Tom van Bussel

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 22/Jan/24 14:00

Updated:: 23/Jan/24 16:56

Resolved:: 23/Jan/24 16:56