Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-11641

Can't update a queue hierarchy in absolute mode when the configured capacities are zero

    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      Error symptoms

      It is not possible to modify a queue hierarchy in absolute mode when the parent or every child queue of the parent has 0 min resource configured.

      2024-01-05 15:38:59,016 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager: Initialized queue: root.a.c
      2024-01-05 15:38:59,016 ERROR org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices: Exception thrown when modifying configuration.
      java.io.IOException: Failed to re-init queues : Parent=root.a: When absolute minResource is used, we must make sure both parent and child all use absolute minResource
      

      Reproduction

      capacity-scheduler.xml

      <?xml version="1.0"?>
      <configuration>
        <property>
          <name>yarn.scheduler.capacity.root.queues</name>
          <value>default,a</value>
        </property>
        <property>
          <name>yarn.scheduler.capacity.root.capacity</name>
          <value>[memory=40960, vcores=16]</value>
        </property>
        <property>
          <name>yarn.scheduler.capacity.root.default.capacity</name>
          <value>[memory=1024, vcores=1]</value>
        </property>
        <property>
          <name>yarn.scheduler.capacity.root.default.maximum-capacity</name>
          <value>[memory=1024, vcores=1]</value>
        </property>
        <property>
          <name>yarn.scheduler.capacity.root.a.capacity</name>
          <value>[memory=0, vcores=0]</value>
        </property>
        <property>
          <name>yarn.scheduler.capacity.root.a.maximum-capacity</name>
          <value>[memory=39936, vcores=15]</value>
        </property>
        <property>
          <name>yarn.scheduler.capacity.root.a.queues</name>
          <value>b,c</value>
        </property>
        <property>
          <name>yarn.scheduler.capacity.root.a.b.capacity</name>
          <value>[memory=0, vcores=0]</value>
        </property>
        <property>
          <name>yarn.scheduler.capacity.root.a.b.maximum-capacity</name>
          <value>[memory=39936, vcores=15]</value>
        </property>
        <property>
          <name>yarn.scheduler.capacity.root.a.c.capacity</name>
          <value>[memory=0, vcores=0]</value>
        </property>
        <property>
          <name>yarn.scheduler.capacity.root.a.c.maximum-capacity</name>
          <value>[memory=39936, vcores=15]</value>
        </property>
      </configuration>
      

      updatequeue.xml

      <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
      <sched-conf>
      <update-queue>
        <queue-name>root.a</queue-name>
        <params>
          <entry>
            <key>capacity</key>
            <value>[memory=1024,vcores=1]</value>
          </entry>
          <entry>
            <key>maximum-capacity</key>
            <value>[memory=39936,vcores=15]</value>
          </entry>
        </params>
      </update-queue>
      </sched-conf>
      
      $ curl -X PUT -H 'Content-Type: application/xml' -d @updatequeue.xml http://localhost:8088/ws/v1/cluster/scheduler-conf\?user.name\=yarn
      Failed to re-init queues : Parent=root.a: When absolute minResource is used, we must make sure both parent and child all use absolute minResource
      

      Root cause

      setChildQueues is called during reinit, where:

        void setChildQueues(Collection<CSQueue> childQueues) throws IOException {
          writeLock.lock();
          try {
            boolean isLegacyQueueMode = queueContext.getConfiguration().isLegacyQueueMode();
            if (isLegacyQueueMode) {
              QueueCapacityType childrenCapacityType =
                  getCapacityConfigurationTypeForQueues(childQueues);
              QueueCapacityType parentCapacityType =
                  getCapacityConfigurationTypeForQueues(ImmutableList.of(this));
      
              if (childrenCapacityType == QueueCapacityType.ABSOLUTE_RESOURCE
                  || parentCapacityType == QueueCapacityType.ABSOLUTE_RESOURCE) {
                // We don't allow any mixed absolute + {weight, percentage} between
                // children and parent
                if (childrenCapacityType != parentCapacityType && !this.getQueuePath()
                    .equals(CapacitySchedulerConfiguration.ROOT)) {
                  throw new IOException("Parent=" + this.getQueuePath()
                      + ": When absolute minResource is used, we must make sure both "
                      + "parent and child all use absolute minResource");
                }
      

      The parent or childrenCapacityType will be considered as PERCENTAGE, because getCapacityConfigurationTypeForQueues fails to detect the absolute mode, here:

              if (!queue.getQueueResourceQuotas().getConfiguredMinResource(nodeLabel)
                  .equals(Resources.none())) {
                absoluteMinResSet = true;
      

      (It only happens in legacy queue mode.)

      Possible fixes

      Possible fix in AbstractParentQueue.getCapacityConfigurationTypeForQueues using the capacityVector:

          for (CSQueue queue : queues) {
            for (String nodeLabel : queueCapacities.getExistingNodeLabels()) {
              Set<QueueCapacityVector.ResourceUnitCapacityType> definedCapacityTypes =
                  queue.getConfiguredCapacityVector(nodeLabel).getDefinedCapacityTypes();
              if (definedCapacityTypes.size() == 1) {
                QueueCapacityVector.ResourceUnitCapacityType next = definedCapacityTypes.iterator().next();
                if (Objects.requireNonNull(next) == PERCENTAGE) {
                  percentageIsSet = true;
                  diagMsg.append("{Queue=").append(queue.getQueuePath()).append(", label=").append(nodeLabel)
                      .append(" uses percentage mode}. ");
                } else if (next == QueueCapacityVector.ResourceUnitCapacityType.ABSOLUTE) {
                  absoluteMinResSet = true;
                  diagMsg.append("{Queue=").append(queue.getQueuePath()).append(", label=").append(nodeLabel)
                      .append(" uses absolute mode}. ");
                } else if (next == QueueCapacityVector.ResourceUnitCapacityType.WEIGHT) {
                  weightIsSet = true;
                  diagMsg.append("{Queue=").append(queue.getQueuePath()).append(", label=").append(nodeLabel)
                      .append(" uses weight mode}. ");
                }
              } else if (definedCapacityTypes.size() > 1) {
                mixedIsSet = true;
                diagMsg.append("{Queue=").append(queue.getQueuePath()).append(", label=").append(nodeLabel)
                    .append(" uses mixed mode}. ");
              }
            }
          }
      

      Pre capacityVector, we could utilise checkConfigTypeIsAbsoluteResource, e.g.:

      -        if (!queue.getQueueResourceQuotas().getConfiguredMinResource(nodeLabel)
      -            .equals(Resources.none())) {
      +        if (checkConfigTypeIsAbsoluteResource(queue.getQueuePath(), nodeLabel)) {
      

      Attachments

        1. hierarchy.png
          25 kB
          Tamas Domok

        Issue Links

          Activity

            People

              tdomok Tamas Domok
              tdomok Tamas Domok
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: