Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Not A Problem
-
None
-
None
Description
If we use DefaultResourceCalculator, then Capacity Scheduler keeps allocating containers even if we run out of vcores.
CS checks the the available resources at two places. The first check is CapacityScheduler.allocateContainerOnSingleNode():
if (calculator.computeAvailableContainers(Resources .add(node.getUnallocatedResource(), node.getTotalKillableResources()), minimumAllocation) <= 0) { LOG.debug("This node " + node.getNodeID() + " doesn't have sufficient " + "available or preemptible resource for minimum allocation");
The second, which is more important, is located in RegularContainerAllocator.assignContainer():
if (!Resources.fitsIn(rc, capability, totalResource)) { LOG.warn("Node : " + node.getNodeID() + " does not have sufficient resource for ask : " + pendingAsk + " node total capability : " + node.getTotalResource()); // Skip this locality request ActivitiesLogger.APP.recordSkippedAppActivityWithoutAllocation( activitiesManager, node, application, schedulerKey, ActivityDiagnosticConstant. NODE_TOTAL_RESOURCE_INSUFFICIENT_FOR_REQUEST + getResourceDiagnostics(capability, totalResource), ActivityLevel.NODE); return ContainerAllocation.LOCALITY_SKIPPED; }
Here, rc is the resource calculator instance, the other two values are:
Resource capability = pendingAsk.getPerAllocationResource(); Resource available = node.getUnallocatedResource();
There is a repro unit test attatched to this case, which can demonstrate the problem. The root cause is that we pass the resource calculator to Resource.fitsIn(). Instead, we should use an overridden version, just like in FSAppAttempt.assignContainer():
// Can we allocate a container on this node? if (Resources.fitsIn(capability, available)) { // Inform the application of the new container for this request RMContainer allocatedContainer = allocate(type, node, schedulerKey, pendingAsk, reservedContainer);
In CS, if we switch to DominantResourceCalculator OR use Resources.fitsIn() without the calculator in RegularContainerAllocator.assignContainer(), that fixes the failing unit test (see testTooManyContainers() in TestTooManyContainers.java).
Attachments
Attachments
Issue Links
- links to