Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-41449

Stage level scheduling, allow to change number of executors

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.3.0, 3.3.1
    • None
    • Scheduler

    Description

      Since the total/max number of executor is constant throughout the application - in dynamic or static allocation - there is loose control over how much GPUs will be requested from the resource manager. 

      For example, if an application needs 500 executors for the ETL part (with N cores each), but it needs - or allowed - only 50 GPUs for the DL part, in practice it will request at least 500 GPUs from the RM, since `spark.executor.instances` is set to 500. This leads to resource management challenges in multi tenant environments.

      A quick workaround is to repartition the RDD to 50 partitions just before switching resources, but it has obvious downsides. 

      It would be very helpful if the total/max number of executors could also be configured in the Resource Profile.

      Attachments

        Activity

          People

            Unassigned Unassigned
            shay_elbaz Shay Elbaz
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: