Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
3.0.0
-
None
Description
Currently, Spark Scheduler takes quite some time to dequeue speculative tasks for larger tasksets within a stage(like 100000 or more) when speculation is turned on. On further analysis, it was found that the "task-result-getter" threads remain blocked on one of the dispatcher-event-loop threads holding the lock on TaskSchedulerImpl object
def resourceOffers(offers: IndexedSeq[WorkerOffer]): Seq[Seq[TaskDescription]] = synchronized {
which takes quite some time to execute the method "dequeueSpeculativeTask" in TaskSetManager.scala, thus, slowing down the overall running time of the spark job. We were monitoring the time utilization of that lock for the whole duration of the job and it was close to 50% i.e. the code within the synchronized block would run for almost half the duration of the entire spark job. The screenshots of the thread dump have been attached below for reference.
Attachments
Attachments
Issue Links
- links to