Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-45869

Revisit and Improve Spark Standalone Cluster

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 4.0.0
    • 4.0.0
    • Spark Core

    Description

      Spark Standalone Cluster has been supported for a long time as one of the resource managers.

      As a part of Apache Spark 4.0.0, we revisit all layers of `Spark Standalone Cluster` as a long running subsystem inside K8s environment.

      1. Spark Master and Worker Web UI Layer
      2. Spark Master HA and Recovery Layer
      3. Spark Master REST API Layer (including Cluster Utilization monitoring)
      4. Spark Job Scheduling Layer
      5. Spark Worker Management by exposing Cluster Utilization monitoring for Elastic Cluster Management
      6. Spark Master/Worker dependency and classpath audit
      7. Documentation

      Attachments

        Issue Links

          1.
          Add PersistenceEngineBenchmark Sub-task Resolved Dongjoon Hyun
          2.
          Include `Driver/App` data in `PersistenceEngineBenchmark` Sub-task Resolved Dongjoon Hyun
          3.
          Add RocksDBPersistenceEngine Sub-task Resolved Dongjoon Hyun
          4.
          Improve `PersistenceEngine` performance with `KryoSerializer` Sub-task Resolved Dongjoon Hyun
          5.
          Improve `FileSystemPersistenceEngine.persist` error message in case of the existing file Sub-task Resolved Dongjoon Hyun
          6.
          Improve `FileSystemPersistenceEngine` to allow non-exist parents Sub-task Resolved Dongjoon Hyun
          7.
          Improve `FileSystemPersistenceEngine` to support compressions Sub-task Resolved Dongjoon Hyun
          8.
          Improve `Master` to recover quickly in case of zero workers and apps Sub-task Resolved Dongjoon Hyun
          9.
          Support `spark.deploy.recoveryTimeout` Sub-task Resolved Dongjoon Hyun
          10.
          Make `spark.deploy.recovery*` documentation up-to-date Sub-task Resolved Dongjoon Hyun
          11.
          Support `spark.master.rest.host` Sub-task Resolved Dongjoon Hyun
          12.
          Support `killall` in REST Submission API Sub-task Resolved Dongjoon Hyun
          13.
          Support `clear` in REST Submission API Sub-task Resolved Dongjoon Hyun
          14.
          Support `readyz` in REST Submission API Sub-task Resolved Dongjoon Hyun
          15.
          Support `spark.deploy.spreadOutDrivers` Sub-task Resolved Dongjoon Hyun
          16.
          Support `spark.deploy.workerSelectionPolicy` Sub-task Resolved Dongjoon Hyun
          17.
          Support `spark.deploy.maxDrivers` Sub-task Resolved Dongjoon Hyun
          18.
          Support `spark.worker.idPattern` Sub-task Resolved Dongjoon Hyun
          19.
          Support `spark.deploy.driverIdPattern` Sub-task Resolved Dongjoon Hyun
          20.
          Support `spark.deploy.appIdPattern` Sub-task Resolved Dongjoon Hyun
          21.
          Support `spark.deploy.appNumberModulo` to rotate app number Sub-task Resolved Dongjoon Hyun
          22.
          Support `spark.master.useAppNameAsAppId.enabled` Sub-task Resolved Dongjoon Hyun
          23.
          Support `spark.test.appId` in `LocalSchedulerBackend` Sub-task Resolved Dongjoon Hyun
          24.
          Support `spark.master.ui.historyServerUrl` in `ApplicationPage` Sub-task Resolved Dongjoon Hyun
          25.
          Support `spark.worker.(initial|max)RegistrationRetries` Sub-task Resolved Dongjoon Hyun
          26.
          Support `spark.driver.timeout` and `DriverTimeoutPlugin` Sub-task Resolved Dongjoon Hyun
          27.
          Support Spark Master Log UI Sub-task Resolved Dongjoon Hyun
          28.
          Support Spark Worker Log UI Sub-task Resolved Dongjoon Hyun
          29.
          Support Spark History Server Log UI Sub-task Resolved Dongjoon Hyun
          30.
          Support Spark Driver Live Log UI Sub-task Resolved Dongjoon Hyun
          31.
          Support top-level filtering in MasterPage JSON API Sub-task Resolved Dongjoon Hyun
          32.
          Add `Environment` page to Master UI Sub-task Resolved Dongjoon Hyun
          33.
          Show a summary of workers in MasterPage Sub-task Resolved Dongjoon Hyun
          34.
          Show the number of drivers waiting in SUBMITTED status Sub-task Resolved Dongjoon Hyun
          35.
          Show the number of abnormally completed drivers in MasterPage Sub-task Resolved Dongjoon Hyun
          36.
          Improve `MasterPage` to show `Resource` column only when it exists Sub-task Resolved Dongjoon Hyun
          37.
          Make StandaloneRestServer add JavaModuleOptions to drivers Sub-task Resolved Dongjoon Hyun
          38.
          Fix WorkerPage to use the same pattern for `logPage` urls Sub-task Resolved Dongjoon Hyun
          39.
          Fix getBaseURI error in Spark Worker LogPage UI buttons Sub-task Resolved Dongjoon Hyun
          40.
          Fix `MasterPage` to sort `Running Drivers` table by `Duration` column correctly Sub-task Resolved Dongjoon Hyun
          41.
          Fix Spark History Server to sort `Duration` column properly Sub-task Resolved Dongjoon Hyun
          42.
          Collect and update `spark-standalone.md` with new confs Sub-task Resolved Dongjoon Hyun
          43.
          Fix `Spark Standalone` documentation table layout Sub-task Resolved Dongjoon Hyun
          44.
          Make single-pod spark jobs respect spark.app.id Sub-task Resolved Dongjoon Hyun
          45.
          Document `spark.master.*` configurations Sub-task Resolved Dongjoon Hyun
          46.
          EventLogFileReader should not read rolling logs if appStatus is missing Sub-task Resolved Dongjoon Hyun
          47.
          Document REST API for Spark Standalone Cluster Sub-task Resolved Dongjoon Hyun
          48.
          Document `SPARK_LOG_*` and `SPARK_PID_DIR` Sub-task Resolved Dongjoon Hyun
          49.
          Log the final state of drivers during `Master.removeDriver` Sub-task Resolved Dongjoon Hyun
          50.
          Log Spark HA recovery duration Sub-task Resolved Dongjoon Hyun
          51.
          Warn properly when a driver exists successfully but master is disconnected Sub-task Resolved Dongjoon Hyun
          52.
          Fix Master to update worker from UNKNOWN to ALIVE on RegisterWorker message Sub-task Resolved Dongjoon Hyun
          53.
          Remove `kill` link from RELAUNCHING drivers in MasterPage Sub-task Resolved Dongjoon Hyun
          54.
          Refactor to improve `RegisterWorker` unit test Sub-task Resolved Dongjoon Hyun
          55.
          Remove `*slave*.sh` scripts Sub-task Resolved Dongjoon Hyun
          56.
          Rename spark.deploy.spreadOut to spark.deploy.spreadOutApps Sub-task Resolved Dongjoon Hyun
          57.
          Improve `MasterSuite` to use nanoTime-based appIDs and workerIDs Sub-task Resolved Dongjoon Hyun
          58.
          Fix `spark-daemon.sh` usage by adding `decommission` command Sub-task Resolved Dongjoon Hyun
          59.
          Make RocksDBPersistenceEngine to support a symbolic link Sub-task Resolved Dongjoon Hyun
          60.
          Make `WorkerResourceInfo` extend `Serializable` explicitly Sub-task Resolved Dongjoon Hyun
          61.
          Add `logrotate` to Spark docker files Sub-task Resolved Dongjoon Hyun
          62.
          Recover `log-view.js` to be non-module Sub-task Resolved Dongjoon Hyun
          63.
          Support `/json/clusterutilization` API Sub-task Resolved Dongjoon Hyun
          64.
          Fix `Master` to reject worker kill request if decommission is disabled Sub-task Resolved Dongjoon Hyun
          65.
          Validate `spark.master.ui.decommission.allow.mode` setting Sub-task Resolved Dongjoon Hyun
          66.
          Remove POST APIs from `MasterWebUI` when spark.ui.killEnabled is false Sub-task Resolved Dongjoon Hyun
          67.
          Show driver log location in Spark History Server Sub-task Resolved Dongjoon Hyun
          68.
          Fix `Load New` button in `Master/HistoryServer` Log UI Sub-task Resolved Dongjoon Hyun
          69.
          Check logType in Utils.getLog Sub-task Resolved Dongjoon Hyun
          70.
          Hide `Thread Dump` and `Heap Histogram` of `Dead` executors in `Executors` UI Sub-task Resolved Dongjoon Hyun
          71.
          Use `getTotalMemorySize` in `WorkerArguments` Sub-task Resolved Dongjoon Hyun
          72.
          Enable `spark.worker.cleanup.enabled` by default Sub-task Resolved Dongjoon Hyun
          73.
          Fix `RealBrowserUISeleniumSuite` Sub-task Resolved Dongjoon Hyun
          74.
          Add `WebBrowserTest` Sub-task Resolved Dongjoon Hyun
          75.
          Exclude `CodeHaus Jackson` dependencies from Master/Worker/HistoryServer classpaths Sub-task Resolved Unassigned
          76.
          Fix `MasterSuite` to validate the number of registered workers Sub-task Resolved Dongjoon Hyun
          77.
          Reduce the number of required threads in MasterSuite Sub-task Resolved Dongjoon Hyun
          78.
          Make `PluginEndpoint` warn when plugins reply for one-way message Sub-task Resolved Dongjoon Hyun
          79.
          Make `BlockManager` warn before `removeBlockInternal` Sub-task Resolved Dongjoon Hyun

          Activity

            People

              dongjoon Dongjoon Hyun
              dongjoon Dongjoon Hyun
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: