Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-24618

Allow ability to consume driver memory on worker hosts not master (option for clustermode to wait for returncode?)

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 2.3.1
    • None
    • Scheduler, Spark Core

    Description

      My scenario is this:
      EC2 master (488GB RAM of memory and 64 cores)
      Autoscaling group of up to 8 EC2 workers that get registered with the master

      I send 100s of parallel spark-submits to the ec2 master but I seem to be artificially limited to approx 240 in parallel (if driver of each spark-submit takes 2gb memory). I would like to know the returncode of each sparksubmit so deploymode is client. I understand using deploymode of cluster would not wait for the returncode.
      Sparksubmits are not submitted directly to worker nodes as EC2s are ephemeral beasts that pop-up/down regularly, while the master can simply redirect tasks to another worker whenever another worker is lost.

      This new feature would allow as many spark-submits in parallel as there is total memory in the pool of 8 worker nodes (ie don't limit by memory of the master) AND make each sparksubmit wait for return code.

      Attachments

        Activity

          People

            Unassigned Unassigned
            toopt4 t oo
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: