Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
Mesos Foundation R7 Sprint 33
-
2
Description
The fails with:
../../src/tests/slave_recovery_tests.cpp:4797 Failed to wait 15secs for offers2
It is flaky because after partially accept the first offer, the scheduler will filter the agent for 5 seconds, and depending on the system load that may last more than 15 seconds:
I1026 14:32:40.976006 78393344 hierarchical.cpp:2388] Filtered offer with cpus:2 on agent c8805e35-97ee-44d4-9ab9-e7ea8d703c08-S0 for role * of framework c8805e35-97ee-44d4-9ab9-e7ea8d703c08-0000 I1026 14:32:40.976065 78393344 hierarchical.cpp:1566] Performed allocation for 1 agents in 197914ns I1026 14:32:41.981853 79466496 hierarchical.cpp:2388] Filtered offer with cpus:2 on agent c8805e35-97ee-44d4-9ab9-e7ea8d703c08-S0 for role * of framework c8805e35-97ee-44d4-9ab9-e7ea8d703c08-0000 I1026 14:32:41.981920 79466496 hierarchical.cpp:1566] Performed allocation for 1 agents in 185853ns I1026 14:32:42.983603 78393344 hierarchical.cpp:2388] Filtered offer with cpus:2 on agent c8805e35-97ee-44d4-9ab9-e7ea8d703c08-S0 for role * of framework c8805e35-97ee-44d4-9ab9-e7ea8d703c08-0000 I1026 14:32:42.983659 78393344 hierarchical.cpp:1566] Performed allocation for 1 agents in 179397ns I1026 14:32:57.363723 78929920 hierarchical.cpp:2388] Filtered offer with cpus:2 on agent c8805e35-97ee-44d4-9ab9-e7ea8d703c08-S0 for role * of framework c8805e35-97ee-44d4-9ab9-e7ea8d703c08-0000 I1026 14:32:57.363788 78929920 hierarchical.cpp:1566] Performed allocation for 1 agents in 466500ns ../../src/tests/slave_recovery_tests.cpp:4797: Failure Failed to wait 15secs for offers2
I think we can either remove the expectation of the second offer (which I think is not essential to the test) or manually control the clock.
Full log attached.
Attachments
Attachments
Issue Links
- is related to
-
MESOS-9741 Test `SlaveRecoveryTest.AgentReconfigurationWithRunningTask` is flaky.
- Open