Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-11507

CSQueue properties are affected by DominantResourceCalculator in a non-intuitive way

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.4.0
    • None
    • capacityscheduler
    • None

    Description

      The following queue hierarchy have different capacity/absoluteCapacity for its queues, based on which resource calculator is used (Default or Dominant).

          conf.put("yarn.scheduler.capacity.resource-calculator", "org.apache.hadoop.yarn.util.resource.DominantResourceCalculator");
          conf.put("yarn.scheduler.capacity.legacy-queue-mode.enabled", "true");
          conf.put("yarn.scheduler.capacity.root.queues", "a, b");
          conf.put("yarn.scheduler.capacity.root.a.capacity", "[memory=4096,vcores=8]");
          conf.put("yarn.scheduler.capacity.root.b.capacity", "[memory=12288,vcores=8]");
          conf.put("yarn.scheduler.capacity.root.b.queues", "b1, b2");
          conf.put("yarn.scheduler.capacity.root.b.b1.capacity", "[memory=3072,vcores=6]");
          conf.put("yarn.scheduler.capacity.root.b.b2.capacity", "[memory=9216,vcores=2]");
      
                          DefaultResourceCalculator               DominantResourceCalculator
                          capacity absoluteCapacity maxApps       capacity absoluteCapacity maxApps
         root.a           0.25     0.25             2500          0.5      0.5              5000
         root.b           0.75     0.75                           0.75     0.75
         root.b.b1        0.25     0.1875           1875          0.5      0.375            3750
         root.b.b2        0.75     0.5625           5625          0.75     0.5625           5625
      

      Issues: using DominantResourceCalculator, the capacity/absoluteCapacity for the first (even second) level of queues is greater than 100%. There are properties (like maxApplications) that are calculated from the absoluteCapacity (e.g.: the sum of max apps is 10000 using the DefaultRC but 14375 using the DominantRC).

      I don't see any reason why the ResourceCalculator abstraction should affect the capacity/absoluteCapacity or any property of the queue queues in the hierarchy. The cluster resource should be shared amongst the queues based on their configuration on the individual resource types. The effectiveMin/Max resource should be a calculated for each queue for each resource type and that should be the source of truth for the available resources for the queues, and later that should be used for calculations. The absoluteCapacity should be calculated from only one resource type (e.g.: memory) or it should be normalised someway.

      The DominantResourceCalculator is useful when the whole cluster is utilised by apps with multiple users (see this research: https://cs.stanford.edu/~matei/papers/2011/nsdi_drf.pdf), but the queues are not competing with each other with different dominant resources. The cluster resource should be just shared based on the queue configurations.

      Added a test case for reproduction to my fork.

      Attachments

        Activity

          People

            tdomok Tamas Domok
            tdomok Tamas Domok
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: