Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-47495

Primary resource jar added to spark.jars twice under k8s cluster mode

    XMLWordPrintableJSON

Details

    Description

      Context:
      To submit spark jobs to Kubernetes under cluster mode, the spark-submit will be triggered twice.
      The first time SparkSubmit will run under k8s cluster mode, it will append primary resource to spark.jars and call KubernetesClientApplication::start to create a driver pod.
      The driver pod will run spark-submit again with the same primary resource jar. However this time the SparkSubmit will run under client mode with spark.kubernetes.submitInDriver as true, plus the updated spark.jars. Under this mode, SparkSubmit will download all the jars in spark.jars to driver and those spark.jars urls will be replaced by the driver local paths.
      Then SparkSubmit will append the same primary resource to spark.jars again. So in this case, spark.jars will have 2 paths of duplicate copies of primary resource, one with the original url user submit with, the other with the driver local file path.
      Later when driver starts the SparkContext, it will copy all the spark.jars to spark.app.initial.jar.urls, and replace the driver local jars paths in spark.app.initial.jar.urls with driver file service paths.
      Now all the jars in the --jars or `spark.jars` in the original user submission will be replaced with a driver file service url and added to spark.app.initial.jar.urls. And the primary resource jar in the original submission will show up in spark.app.initial.jar.urls twice: one with the original path in the user submission, the other with a driver file service url.
      When executors start, they will download all the jars in the spark.app.initial.jar.urls.

      Issue:
      The executor will download 2 duplicate copies of primary resource, one with the original url user submit with, the other with the driver local file path, which leads to resource waste. This is also reported previously here.

      Attachments

        Issue Links

          Activity

            People

              Jiale Jiale Tan
              Jiale Jiale Tan
              Dongjoon Hyun Dongjoon Hyun
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: