Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
One problem we are seeing is where FairScheduler repeatedly preempts for the same job. This is presumably because our preemption interval is set to a low number (1 minute). FS queues up N tasks to be killed - but in 1 min it is not able to kill and schedule new tasks on all these slots. As a result, after 1 min - it again preempts a whole bunch of tasks.
We could (and probably will) workaround this by increasing the preemption interval. However - this gives us a hard tradeoff between accurate preemption and timely preemption. Not good. Ideally we want to make the first set of preemptions quickly (to provide responsive behavior to new jobs for example) - but wait (to make sure that the kill actions have actually been processed) thereafter.