Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-37572

Flexible ways of launching executors

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.3.0
    • None
    • Deploy
    • None

    Description

      Currently Spark launches executor processes by constructing and running commands [1], for example:

      /usr/lib/jvm/adoptopenjdk-8-hotspot-amd64/jre/bin/java -cp /opt/spark-3.2.0-bin-hadoop3.2/conf/:/opt/spark-3.2.0-bin-hadoop3.2/jars/* -Xmx1024M -Dspark.driver.port=35729 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@example.host:35729 --executor-id 0 --hostname 100.116.124.193 --cores 6 --app-id app-20211207131146-0002 --worker-url spark://Worker@100.116.124.193:45287 

      But there are use cases which require more flexible ways of launching executors. In particular, our use case is that we run Spark in standalone mode, Spark master and workers are running in VMs. We want to allow Spark app developers to provide custom container images to customize the job runtime environment (typically Java and Python dependencies), so executors (which run the job code) need to run in Docker containers.

      After reading the source code, we found that the concept of Spark Command Runner might be a good solution. Basically, we want to introduce an optional Spark command runner in Spark, so that instead of running the command to launch executor directly, it passes the command to the runner, the runner then runs the command with its own strategy which could be running in Docker, or by default running the command directly.

      The runner is specified through an env variable `SPARK_COMMAND_RUNNER`, which by default could be a simple script like:

      #!/bin/bash
      exec "$@" 

      or in the case of Docker container:

      #!/bin/bash
      docker run ... – "$@" 

       

      I already have a patch for this feature and have tested in our environment.

       

      [1]: https://github.com/apache/spark/blob/v3.2.0/core/src/main/scala/org/apache/spark/deploy/worker/CommandUtils.scala#L52

      Attachments

        Activity

          People

            Unassigned Unassigned
            functicons Dagang Wei
            Votes:
            2 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: