Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-10848

Vcore allocation problem with DefaultResourceCalculator

    XMLWordPrintableJSON

Details

    Description

      If we use DefaultResourceCalculator, then Capacity Scheduler keeps allocating containers even if we run out of vcores.

      CS checks the the available resources at two places. The first check is CapacityScheduler.allocateContainerOnSingleNode():

          if (calculator.computeAvailableContainers(Resources
                  .add(node.getUnallocatedResource(), node.getTotalKillableResources()),
              minimumAllocation) <= 0) {
            LOG.debug("This node " + node.getNodeID() + " doesn't have sufficient "
                + "available or preemptible resource for minimum allocation");
      

      The second, which is more important, is located in RegularContainerAllocator.assignContainer():

          if (!Resources.fitsIn(rc, capability, totalResource)) {
            LOG.warn("Node : " + node.getNodeID()
                + " does not have sufficient resource for ask : " + pendingAsk
                + " node total capability : " + node.getTotalResource());
            // Skip this locality request
            ActivitiesLogger.APP.recordSkippedAppActivityWithoutAllocation(
                activitiesManager, node, application, schedulerKey,
                ActivityDiagnosticConstant.
                    NODE_TOTAL_RESOURCE_INSUFFICIENT_FOR_REQUEST
                    + getResourceDiagnostics(capability, totalResource),
                ActivityLevel.NODE);
            return ContainerAllocation.LOCALITY_SKIPPED;
          }
      

      Here, rc is the resource calculator instance, the other two values are:

          Resource capability = pendingAsk.getPerAllocationResource();
          Resource available = node.getUnallocatedResource();
      

      There is a repro unit test attatched to this case, which can demonstrate the problem. The root cause is that we pass the resource calculator to Resource.fitsIn(). Instead, we should use an overridden version, just like in FSAppAttempt.assignContainer():

         // Can we allocate a container on this node?
          if (Resources.fitsIn(capability, available)) {
            // Inform the application of the new container for this request
            RMContainer allocatedContainer =
                allocate(type, node, schedulerKey, pendingAsk,
                    reservedContainer);
      

      In CS, if we switch to DominantResourceCalculator OR use Resources.fitsIn() without the calculator in RegularContainerAllocator.assignContainer(), that fixes the failing unit test (see testTooManyContainers() in TestTooManyContainers.java).

      Attachments

        1. TestTooManyContainers.java
          3 kB
          Peter Bacsko

        Issue Links

          Activity

            People

              minni31 Minni Mittal
              pbacsko Peter Bacsko
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m