[SPARK-25400] Increase timeouts in schedulerIntegrationSuite - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.4.0
Fix Version/s: 2.3.2, 2.4.0
Component/s: Scheduler, Spark Core
Labels:
None

Description

I just took a look at a flaky failure in SchedulerIntegrationSuite https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95887
it seems the timeout really is too short:

18/09/10 11:14:07.821 mock backend thread INFO TaskSetManager: Starting task 5.0 in stage 1.0 (TID 8, localhost, executor driver, partition 5, PROCESS_LOCAL, 7677 bytes)
18/09/10 11:14:07.821 task-result-getter-2 INFO TaskSetManager: Finished task 3.0 in stage 1.0 (TID 6) in 1 ms on localhost (executor driver) (4/10)
18/09/10 11:14:07.821 task-result-getter-0 INFO TaskSetManager: Finished task 4.0 in stage 1.0 (TID 7) in 1 ms on localhost (executor driver) (5/10)
18/09/10 11:14:07.821 mock backend thread INFO TaskSetManager: Starting task 6.0 in stage 1.0 (TID 9, localhost, executor driver, partition 6, PROCESS_LOCAL, 7677 bytes)
18/09/10 11:14:07.821 task-result-getter-1 INFO TaskSetManager: Finished task 5.0 in stage 1.0 (TID 8) in 0 ms on localhost (executor driver) (6/10)
18/09/10 11:14:09.481 mock backend thread INFO TaskSetManager: Starting task 7.0 in stage 1.0 (TID 10, localhost, executor driver, partition 7, PROCESS_LOCAL, 7677 bytes)
18/09/10 11:14:09.482 dispatcher-event-loop-14 INFO BlockManagerInfo: Removed broadcast_0_piece0 on amp-jenkins-worker-05.amp:36913 in memory (size: 1260.0 B, free: 1638.6 MB)

you'll see that the "mock backend thread" does keep making progress, but for whatever reason there is over a one second delay in the middle. Thats already going over the existing timeouts.

Its possible there is something else going on here, but for now just increasing the timeouts seems like the best next step.

Attachments

Issue Links

is related to

SPARK-43587 Run HealthTrackerIntegrationSuite in a dedicate JVM

Resolved

links to

[Github] Pull Request #22385 (squito)

Activity

People

Assignee:: Imran Rashid

Reporter:: Imran Rashid

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 10/Sep/18 21:57

Updated:: 19/May/23 06:32

Resolved:: 13/Sep/18 19:12