Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
3.1.0
-
None
-
None
Description
I am not talking about spark streaming! just regular batch jobs using spark-submit that may try to read large csv (100+gb) then write it out as parquet. In an autoscaling cluster would be nice to be able to scale down (ie terminate) ec2s without slowing down active spark applications.
for example:
1. start spark cluster with 8 ec2s
2. submit 6 spark apps
3. 1 spark app completes, so 5 apps still running
4. cluster can scale down 1 ec2 (to save $) but don't want to make the existing apps running on the (soon to be terminated) ec2 have to make its csv read, RDD processing steps.etc start from the beginning on different ec2's executors. Instead want to have a 'graceful shutdown' command so that the 8th ec2 does not accept new spark-submit apps to it (ie don't start new executors on it) but finish the ones that have already launched on it, then exit the worker pid. then the ec2 can be terminated
I thought stop-slave.sh could do this but looks like it just kills the pid
Attachments
Issue Links
- relates to
-
SPARK-20628 Keep track of nodes which are going to be shut down & avoid scheduling new tasks
- Resolved