[SPARK-24618] Allow ability to consume driver memory on worker hosts not master (option for clustermode to wait for returncode?) - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: 2.3.1
Fix Version/s: None
Component/s: Scheduler, Spark Core
Labels:
- bulk-closed

Description

My scenario is this:
EC2 master (488GB RAM of memory and 64 cores)
Autoscaling group of up to 8 EC2 workers that get registered with the master

I send 100s of parallel spark-submits to the ec2 master but I seem to be artificially limited to approx 240 in parallel (if driver of each spark-submit takes 2gb memory). I would like to know the returncode of each sparksubmit so deploymode is client. I understand using deploymode of cluster would not wait for the returncode.
Sparksubmits are not submitted directly to worker nodes as EC2s are ephemeral beasts that pop-up/down regularly, while the master can simply redirect tasks to another worker whenever another worker is lost.

This new feature would allow as many spark-submits in parallel as there is total memory in the pool of 8 worker nodes (ie don't limit by memory of the master) AND make each sparksubmit wait for return code.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: t oo

Votes:: 1 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 21/Jun/18 13:04

Updated:: 08/Oct/19 05:42

Resolved:: 08/Oct/19 05:42