[SPARK-26755] Optimize Spark Scheduler to dequeue speculative tasks more efficiently - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 3.0.0
Fix Version/s: 3.0.0
Component/s: Scheduler, Spark Core
Labels:
None

Description

Currently, Spark Scheduler takes quite some time to dequeue speculative tasks for larger tasksets within a stage(like 100000 or more) when speculation is turned on. On further analysis, it was found that the "task-result-getter" threads remain blocked on one of the dispatcher-event-loop threads holding the lock on TaskSchedulerImpl object

def resourceOffers(offers: IndexedSeq[WorkerOffer]): Seq[Seq[TaskDescription]] = synchronized {

which takes quite some time to execute the method "dequeueSpeculativeTask" in TaskSetManager.scala, thus, slowing down the overall running time of the spark job. We were monitoring the time utilization of that lock for the whole duration of the job and it was close to 50% i.e. the code within the synchronized block would run for almost half the duration of the entire spark job. The screenshots of the thread dump have been attached below for reference.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

Screen Shot 2019-01-28 at 11.22.42 AM.png
28/Jan/19 22:58
451 kB
Parth Gandhi
Screen Shot 2019-01-28 at 11.21.25 AM.png
28/Jan/19 22:57
780 kB
Parth Gandhi
Screen Shot 2019-01-28 at 11.21.05 AM.png
28/Jan/19 22:57
563 kB
Parth Gandhi

Issue Links

links to

GitHub Pull Request #23677

Activity

People

Assignee:: Parth Gandhi

Reporter:: Parth Gandhi

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 28/Jan/19 22:57

Updated:: 17/May/20 17:48

Resolved:: 30/Jul/19 14:55