Details
-
Improvement
-
Status: Open
-
Normal
-
Resolution: Unresolved
-
None
-
None
Description
Since read repair doesn't mirror mutations to pending endpoints, it seems likely that there's an edge case that can break the monotonic quorum read guarantee blocking read repair is supposed to provide.
Assuming there are 3 nodes (A, B, & C) which replicate a token range. A new node D is added, which will take over some of A's token range. During the bootstrap of D, if there's a failed write that only makes it to a single node (A) after bootstrap has started, then there's a quorum read including A & B, which replicates that value to B. If A is removed when D finishes bootstrapping, a quorum read including node C & D will not see the value returned in the last quorum read which queried A & B.
Table to illustrate:
state | A | B | C | D |
1 begin | pending | |||
2 write | 1 | pending | ||
3 repair | 1 | 1 | pending | |
4 joined | n/a | 1 |