Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-29153

ResourceProfile conflict resolution stage level scheduling

    XMLWordPrintableJSON

Details

    • Story
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0
    • 3.1.0
    • Scheduler, Spark Core
    • None

    Description

      For the stage level scheduling, if a stage has ResourceProfiles from multiple RDD that conflict we have to resolve that conflict.

      We may have 2 approaches. 

      1. default to error out if conflicting, that way user realizes what is going on, have a config to turn this on and off.
      2. If config to error out if off, then resolve the conflict.  See below from the design doc on the SPIP.

      For the merge strategy we can choose the max from the ResourceProfiles to make the largest container required. This in general will work but there are a few cases people may have intended them to be a sum.  For instance lets say one RDD needs X memory and another RDD needs Y memory. It might be when those get combined into a stage you really need X+Y memory vs the max(X, Y). Another example might be union, where you would want to sum the resources of each RDD. I think we can document what we choose for now and later on add in the ability to have other alternatives then max.  Or perhaps we do need to change what we do either per operation or per resource type. 

      Attachments

        Issue Links

          Activity

            People

              tgraves Thomas Graves
              tgraves Thomas Graves
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: