Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-9889

Master CPU high due to unexpected foreachkey behaviour in Master::__reregisterSlave.

    XMLWordPrintableJSON

Details

    Description

      At https://github.com/apache/mesos/blob/9932550e9632e7fbb9a45b217793c7f508f57001/src/master/master.cpp#L7707-L7708

      void Master::__reregisterSlave(
      ...
          foreachkey (FrameworkID frameworkId,
                     slaves.unreachableTasks.at(slaveInfo.id())) {
              ...
              foreach (TaskID taskId,
                       slaves.unreachableTasks.at(slaveInfo.id()).get(frameworkId)) {
      

      Our case is when network flapping, 3~4 agents reregister, then master would CPU full and could not process any requests during that period.

      After change

      -    foreachkey (FrameworkID frameworkId,
      -               slaves.unreachableTasks.at(slaveInfo.id())) {
      +    foreach (FrameworkID frameworkId,
      +               slaves.unreachableTasks.at(slaveInfo.id()).keys()) {
      

      The problem gone.

      Attachments

        Issue Links

          Activity

            People

              bmahler Benjamin Mahler
              haosdent@gmail.com haosdent
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: