Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-31726

Make spark.files available in driver with cluster deploy mode on kubernetes

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 3.0.0
    • None
    • Kubernetes, Spark Core
    • None

    Description

      currently on yarn with cluster deploy mode --files makes the files available for driver and executors and also put them on classpath for driver and executors.

      on k8s with cluster deploy mode --files makes the files available on executors but they are not on classpath. it does not make the files available on driver and they are not on driver classpath.

      it would be nice if the k8s behavior was consistent with yarn, or at least makes the files available on driver. once the files are available there is a simple workaround to get them on classpath using spark.driver.extraClassPath="./"

      background:

      we recently started testing kubernetes for spark. our main platform is yarn on which we use client deploy mode. our first experience was that client deploy mode was difficult to use on k8s (we dont launch from inside a pod). so we switched to cluster deploy mode, which seems to behave well on k8s. but then we realized that our program rely on reading files on classpath (application.conf, log4j.properties etc.) that are on the client but now are no longer on the driver (since driver is no longer on client). an easy fix for this seems to be to ship the files using --files to make them available on driver, but we could not get this to work.

       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              koert koert kuipers
              Votes:
              3 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: