Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-10018

Duplicate tasks if agent partitioned during maintenance down

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.7.3, 1.8.2, 1.9.1, 1.10.0
    • None
    • Foundations: RI-19 57, Foundations: RI-20 58
    • 5

    Description

      When the master starts maintenance for a node it

      (1) sends a ShutdownMessage message to agent, and
      (2) removes the slave which transitions all tasks to TASK_LOST and moves them
      to the completed task set.

      If the ShutdownMessage isn't fully processed on the agent (e.g., message dropped between (1) and (2), or agent process killed before the executor has shut down), the agent could come back with the lost task running. It would report the task on registration with the master, which would add it to the list of active tasks. With that the same task could be both completed and active.

      Attachments

        Activity

          People

            bbannier Benjamin Bannier
            bbannier Benjamin Bannier
            Greg Mann Greg Mann
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: