Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-9358

Test `SlaveRecoveryTest.AgentReconfigurationWithRunningTask` is flaky.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.8.0
    • None
    • Mesos Foundation R7 Sprint 33
    • 2

    Description

      The fails with:

      ../../src/tests/slave_recovery_tests.cpp:4797
      Failed to wait 15secs for offers2
      

      It is flaky because after partially accept the first offer, the scheduler will filter the agent for 5 seconds, and depending on the system load that may last more than 15 seconds:

      I1026 14:32:40.976006 78393344 hierarchical.cpp:2388] Filtered offer with cpus:2 on agent c8805e35-97ee-44d4-9ab9-e7ea8d703c08-S0 for role * of framework c8805e35-97ee-44d4-9ab9-e7ea8d703c08-0000
      I1026 14:32:40.976065 78393344 hierarchical.cpp:1566] Performed allocation for 1 agents in 197914ns
      I1026 14:32:41.981853 79466496 hierarchical.cpp:2388] Filtered offer with cpus:2 on agent c8805e35-97ee-44d4-9ab9-e7ea8d703c08-S0 for role * of framework c8805e35-97ee-44d4-9ab9-e7ea8d703c08-0000
      I1026 14:32:41.981920 79466496 hierarchical.cpp:1566] Performed allocation for 1 agents in 185853ns
      I1026 14:32:42.983603 78393344 hierarchical.cpp:2388] Filtered offer with cpus:2 on agent c8805e35-97ee-44d4-9ab9-e7ea8d703c08-S0 for role * of framework c8805e35-97ee-44d4-9ab9-e7ea8d703c08-0000
      I1026 14:32:42.983659 78393344 hierarchical.cpp:1566] Performed allocation for 1 agents in 179397ns
      I1026 14:32:57.363723 78929920 hierarchical.cpp:2388] Filtered offer with cpus:2 on agent c8805e35-97ee-44d4-9ab9-e7ea8d703c08-S0 for role * of framework c8805e35-97ee-44d4-9ab9-e7ea8d703c08-0000
      I1026 14:32:57.363788 78929920 hierarchical.cpp:1566] Performed allocation for 1 agents in 466500ns
      ../../src/tests/slave_recovery_tests.cpp:4797: Failure
      Failed to wait 15secs for offers2
      

      I think we can either remove the expectation of the second offer (which I think is not essential to the test) or manually control the clock.

      Full log attached.

      Attachments

        Issue Links

          Activity

            People

              bennoe Benno Evers
              mzhu Meng Zhu
              Joseph Wu Joseph Wu
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: