[SPARK-28845] Enable spark.sql.execution.sortBeforeRepartition only for retried stages - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Won't Do
Affects Version/s: 3.0.0
Fix Version/s: None
Component/s: Spark Core, SQL
Labels:
None

Description

For fixing the correctness bug of ~~SPARK-28699~~, we disable radix sort for the scenario of repartition in Spark SQL. This will cause a performance regression.

So for limiting the performance overhead, we'll do the optimizing work by only enable sort for the repartition operation while stage retries happening. This work depends on ~~SPARK-25341~~.

Attachments

Issue Links

is blocked by

SPARK-25341 Support rolling back a shuffle map stage and re-generate the shuffle files

Resolved

is caused by

SPARK-28699 Cache an indeterminate RDD could lead to incorrect result while stage rerun

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Yuanjian Li

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 22/Aug/19 02:59

Updated:: 12/Feb/20 07:28

Resolved:: 26/Sep/19 09:50