Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-9147

Agent and scheduler driver authentication retry backoff time could overflow.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 1.5.1, 1.6.1
    • 1.4.3, 1.5.2, 1.6.2, 1.7.0
    • None
    • Mesosphere Sprint 2018-26, Mesosphere Sprint 2018-27
    • 3

    Description

      In the agent we have the following retry backoff calculation logic:

      https://github.com/apache/mesos/blob/874c752316b14055c0a5a7b67f97ccf912abcc3c/src/slave/slave.cpp#L1401-L1418

          Duration backoff =
            flags.authentication_backoff_factor * std::pow(2, failedAuthentications);
      

      Since the `Duration` uses `int64_t` to hold nanosecond, if we set the `authentication_backoff_factor` to 1 second, we will overflow after 34 failed authentications (from second to nanosecond we lose 30 bits and 2^34 in the `pow()`).

      The effect is we do not backoff at all, we will just retry immediately after the 5s timeout:
      https://github.com/apache/mesos/blob/874c752316b14055c0a5a7b67f97ccf912abcc3c/src/master/master.cpp#L9615-L9619

      The scheduler driver also has the same issue.

      We should also audit all the other backoff logic.

      Attachments

        Issue Links

          Activity

            People

              mzhu Meng Zhu
              mzhu Meng Zhu
              Benjamin Mahler Benjamin Mahler
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: