Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-636

Add mechanism to run system management/configuration tasks on all workers

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • None
    • None
    • Spark Core

    Description

      It would be useful to have a mechanism to run a task on all workers in order to perform system management tasks, such as purging caches or changing system properties. This is useful for automated experiments and benchmarking; I don't envision this being used for heavy computation.

      Right now, I can mimic this with something like

      sc.parallelize(0 until numMachines, numMachines).foreach { } 
      

      but this does not guarantee that every worker runs a task and requires my user code to know the number of workers.

      One sample use case is setup and teardown for benchmark tests. For example, I might want to drop cached RDDs, purge shuffle data, and call System.gc() between test runs. It makes sense to incorporate some of this functionality, such as dropping cached RDDs, into Spark itself, but it might be helpful to have a general mechanism for running ad-hoc tasks like System.gc().

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              joshrosen Josh Rosen
              Votes:
              3 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: