[SPARK-23243] Shuffle+Repartition on an RDD could lead to incorrect answers - ASF JIRA

Rank to Top

Rank to Bottom

Attach files

Attach Screenshot

Bulk Copy Attachments

Bulk Move Attachments

Voters

Watch issue

Watchers

Create sub-task

Convert to sub-task

Link

Clone

Labels

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Blocker
Resolution: Fixed
Affects Version/s: 1.6.0, 2.0.0, 2.1.0, 2.2.0, 2.3.0
Fix Version/s: 2.2.3, 2.3.2, 2.4.0
Component/s: Spark Core
Labels:
- correctness

Target Version/s:

2.4.0

Description

The RDD repartition also uses the round-robin way to distribute data, this can also cause incorrect answers on RDD workload the similar way as in https://issues.apache.org/jira/browse/SPARK-23207

The approach that fixes DataFrame.repartition() doesn't apply on the RDD repartition issue, as discussed in https://github.com/apache/spark/pull/20393#issuecomment-360912451

We track for alternative solutions for this issue in this task.

Attachments

Attachments

Issue Links

Add Link

is duplicated by

SPARK-25156 Same query returns different result

Closed

Delete this link

is related to

SPARK-28699 Cache an indeterminate RDD could lead to incorrect result while stage rerun

Resolved

Delete this link

SPARK-25342 Support rolling back a result stage

In Progress

Delete this link

SPARK-25341 Support rolling back a shuffle map stage and re-generate the shuffle files

Resolved

Delete this link

relates to

SPARK-23207 Shuffle+Repartition on an DataFrame could lead to incorrect answers

Resolved

Delete this link

SPARK-29042 Sampling-based RDD with unordered input should be INDETERMINATE

Resolved

Delete this link

links to

[Github] Pull Request #20414 (jiangxb1987)

Delete this link

[Github] Pull Request #21698 (jiangxb1987)

Delete this link

[Github] Pull Request #22112 (cloud-fan)

Delete this link

[Github] Pull Request #22354 (cloud-fan)

Delete this link

[Github] Pull Request #22382 (bersprockets)

Delete this link

GitHub Pull Request #25755

Delete this link

(1 relates to, 6 links to)

Activity

Comment

This comment will be Viewable by All Users Viewable by All Users

Cancel

People

Assignee:: Wenchen Fan

Reporter:: Xingbo Jiang

Votes:: 0 Vote for this issue

Watchers:: 19 Start watching this issue

Dates

Created:: 26/Jan/18 23:00

Updated:: 11/Sep/19 19:05

Resolved:: 05/Sep/18 22:37

Agile

Slack

Issue deployment