Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-20624 SPIP: Add better handling for node shutdown
  3. SPARK-32198

Don't fail running jobs when decommissioned executors finally go away

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.1.0
    • 3.1.0
    • Spark Core
    • None

    Description

      When a decommissioned executor is finally lost, its death shouldn't fail running jobs. 

      A decommissioned executor will eventually die, and in response to its heartbeat failure we will generate a `SlaveLost` message. This SlaveLost message should be treated specially for decommissioned executors: It should not be deemed that this loss is due to the running application. Decommissioning is an exogenous event and the running application shouldn't be penalized for it.

       

       

      Attachments

        Activity

          People

            dagrawal3409 Devesh Agrawal
            dagrawal3409 Devesh Agrawal
            Holden Karau Holden Karau
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: