[MESOS-5576] Masters may drop the first message they send between masters after a network partition - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 0.28.2
Fix Version/s: 0.28.3, 1.0.0
Component/s: leader election, master, replicated log
Labels:
- mesosphere
Environment:

Observed in an OpenStack environment where each master lives on a separate VM.

Sprint:
Mesosphere Sprint 38
Story Points:
5

Description

We observed the following situation in a cluster of five masters:

Time	Master 1	Master 2	Master 3	Master 4	Master 5
0	Follower	Follower	Follower	Follower	Leader
1	Follower	Follower	Follower	Follower	Partitioned from cluster by downing this VM's network
2	Elected Leader by ZK	Voting	Voting	Voting	Suicides due to lost leadership
3	Performs consensus	Replies to leader	Replies to leader	Replies to leader	Still down
4	Performs writing	Acks to leader	Acks to leader	Acks to leader	Still down
5	Leader	Follower	Follower	Follower	Still down
6	Leader	Follower	Follower	Follower	Comes back up
7	Leader	Follower	Follower	Follower	Follower
8	Partitioned in the same way as Master 5	Follower	Follower	Follower	Follower
9	Suicides due to lost leadership	Elected Leader by ZK	Follower	Follower	Follower
10	Still down	Performs consensus	Replies to leader	Replies to leader	Doesn't get the message!
11	Still down	Performs writing	Acks to leader	Acks to leader	Acks to leader
12	Still down	Leader	Follower	Follower	Follower

Master 2 sends a series of messages to the recently-restarted Master 5. The first message is dropped, but subsequent messages are not dropped.

This appears to be due to a stale link between the masters. Before leader election, the replicated log actors create a network watcher, which adds links to masters that join the ZK group:
https://github.com/apache/mesos/blob/7a23d0da817be4e8f68d96f524cecf802431033c/src/log/network.hpp#L157-L159

This link does not appear to break (Master 2 -> 5) when Master 5 goes down, perhaps due to how the network partition was induced (in the hypervisor layer, rather than in the VM itself).

When Master 2 tries to send an PromiseRequest to Master 5, we do not observe the expected log message

Instead, we see a log line in Master 2:

process.cpp:2040] Failed to shutdown socket with fd 27: Transport endpoint is not connected

The broken link is removed by the libprocess socket_manager and the following WriteRequest from Master 2 to Master 5 succeeds via a new socket.

Attachments

Issue Links

is related to

MESOS-5364 Consider adding `unlink` functionality to libprocess

Resolved

relates to

MESOS-5832 Mesos replicated log corruption with disconnects from ZK

Resolved

MESOS-5740 Consider adding `relink` functionality to libprocess

Resolved

Activity

People

Assignee:: Joseph Wu

Reporter:: Joseph Wu

Shepherd:: Benjamin Mahler

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 09/Jun/16 02:46

Updated:: 11/Jul/16 18:14

Resolved:: 30/Jun/16 01:35