Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-11494

Acquired Containers are killed when the node is reconnected

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.3.3
    • None
    • resourcemanager
    • None

    Description

      When a nodemanager is reconnected, resourcemanager marks the acquired containers on that node as LOST and which leads to job failure.

      2023-04-10 02:57:16,412 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService (IPC Server handler 41 on 8025): Reconnect from the node at: node1
      2023-04-10 02:57:16,412 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService (IPC Server handler 41 on 8025): NodeManager from node node1(cmPort: 8041 httpPort: 8042) registered with capability: <memory:122880, vCores:16>, assigned nodeId node1:8041, node labels { CORE } 
      2023-04-10 02:57:16,413 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl (ResourceManager Event Processor): container_e15_1677844874019_238016_01_000002 Container Transitioned from ACQUIRED to KILLED
      

      Attachments

        Activity

          People

            prabhujoseph Prabhu Joseph
            prabhujoseph Prabhu Joseph
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: