Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-41550 Dynamic Allocation on K8S GA
  3. SPARK-40379

Propagate decommission executor loss reason during onDisconnect in K8s

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 3.4.0
    • 3.4.0
    • Kubernetes, Spark Core
    • None

    Description

      Currently if an executor has been sent a decommission message and then it disconnects from the scheduler we only disable the executor depending on the K8s status events to drive the rest of the state transitions. However, the K8s status events can become overwhelmed on large clusters so we should check if an executor is in a decommissioning state when it is disconnected and use that reason instead of waiting on the K8s status events so we have more accurate logging information.

       

      Attachments

        Activity

          People

            holden Holden Karau
            holden Holden Karau
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: