Details
-
Bug
-
Status: Resolved
-
Normal
-
Resolution: Fixed
-
None
-
Correctness - Recoverable Corruption / Loss
-
Critical
-
Normal
-
User Report
-
All
-
None
-
Description
The legacy reading code (LegacyLayout and UnfilteredDeserializer.OldFormatDeserializer) does not handle correctly the case where a range tombstone covering multiple rows interacts with a collection tombstone.
A simple example of this problem is if one runs on 2.X:
CREATE TABLE t ( k int, c1 text, c2 text, a text, b set<text>, c text, PRIMARY KEY((k), c1, c2) ); // Delete all rows where c1 is 'A' DELETE FROM t USING TIMESTAMP 1 WHERE k = 0 AND c1 = 'A'; // Inserts a row covered by that previous range tombstone INSERT INTO t(k, c1, c2, a, b, c) VALUES (0, 'A', 'X', 'foo', {'whatever'}, 'bar') USING TIMESTAMP 2; // Delete the collection of that previously inserted row DELETE b FROM t USING TIMESTAMP 3 WHERE k = 0 AND c1 = 'A' and c2 = 'X';
If the following is ran on 2.X (with everything either flushed in the same table or compacted together), then this will result in the inserted row being duplicated (one part containing the a column, the other the c one).
I will note that this is not a duplicate of CASSANDRA-15789 and this reproduce even with the fix to LegacyLayout of this ticket. That said, the additional code added to CASSANDRA-15789 to force merging duplicated rows if they are produced will end up fixing this as a consequence (assuming there is no variation of this problem that leads to other visible issues than duplicated rows). That said, I "think" we'd still rather fix the source of the issue.