Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Incomplete
-
None
-
None
Description
It would be useful to have a mechanism to run a task on all workers in order to perform system management tasks, such as purging caches or changing system properties. This is useful for automated experiments and benchmarking; I don't envision this being used for heavy computation.
Right now, I can mimic this with something like
sc.parallelize(0 until numMachines, numMachines).foreach { }
but this does not guarantee that every worker runs a task and requires my user code to know the number of workers.
One sample use case is setup and teardown for benchmark tests. For example, I might want to drop cached RDDs, purge shuffle data, and call System.gc() between test runs. It makes sense to incorporate some of this functionality, such as dropping cached RDDs, into Spark itself, but it might be helpful to have a general mechanism for running ad-hoc tasks like System.gc().
Attachments
Issue Links
- is duplicated by
-
SPARK-650 Add a "setup hook" API for running initialization code on each executor
- Resolved
-
SPARK-3513 Provide a utility for running a function once on each executor
- Resolved