[CASSANDRA-16796] Clear pending ranges for a SHUTDOWN peer - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Normal
Resolution: Fixed
Fix Version/s: 3.0.25, 3.11.11, 4.0.1
Component/s: Cluster/Membership
Labels:
None

Bug Category:
Availability - Unavailable
Severity:
Normal
Complexity:
Normal
Discovered By:
User Report
Platform:

All
Impacts:

None
Since Version:

3.0.0
Source Control Link:

https://github.com/apache/cassandra/commit/fbb20b9162b73c4de8a82cf4ffdde3304e904603
Test and Documentation Plan:

Hide

new dtest added

Show
new dtest added

Description

If a node involved in a MOVE operation should fail, peers can sometimes maintain pending ranges for it even when it has left the ring and/or been replaced (in practice until the peer is next bounced). This in turn can lead to bogus unavailable responses to clients if a replica for the any of the pending ranges should go down.

If the moving node crashes hard, a subsequent replacement will correctly fail as long as cassandra.consistent.rangemovement is set to true because the new node will learn the MOVING status from the remaining peers. A graceful shutdown, however, causes that status to be replaced with SHUTDOWN, but doesn't update TokenMetadata, so pending ranges remain for the down node, even after it has been removed from the ring.

Attachments

Activity

People

Assignee:: Sam Tunnicliffe

Reporter:: Sam Tunnicliffe

Authors:: Sam Tunnicliffe

Reviewers:: Caleb Rackliffe

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 12/Jul/21 15:32

Updated:: 20/Jul/21 18:42

Resolved:: 20/Jul/21 18:42