Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-47475

Support `spark.kubernetes.jars.avoidDownloadSchemes` for K8s Cluster Mode

    XMLWordPrintableJSON

Details

    Description

      Under K8s cluster deployment mode, all the jars, including primary resource jar, jars from --jars or spark.jars, will be downloaded to driver local and then served to executors through file server running on driver.

      When jars are big and the application requests a lot of executors, the massive concurrent jars download from the driver will cause network saturation. In this case, the executors jar download will timeout, causing executors to be terminated. From user point of view, the application is trapped in the loop of massive executor loss and re-provision, but never gets enough live executors as requested, which leads to job SLA breach or sometimes job failure.

      Attachments

        Issue Links

          Activity

            People

              Jiale Jiale Tan
              Jiale Jiale Tan
              Dongjoon Hyun Dongjoon Hyun
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: