Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-14801

calculatePendingRanges no longer safe for multiple adjacent range movements

    XMLWordPrintableJSON

Details

    • Normal
    • Normal
    • Hide

       

      • a534b2 adds tests that reproduce both bugs.
      • bb36cd fixes the newly discovered bug (test1Leave1Move in a534b2). There, the failure happens because for the same pending range we can include 2 replicas with the same endpoint and different ranges (violation of PendingRangeMaps Conflict.DUPLICATE policy). This happens because right now we include in PendingRangeMaps the entire new replica after leave for leave-affected ranges and only the pending part of it for move-affected ranges. This commit marks as pending only the missing part of the new replica.
      • a33b03 solves the original issue. Without this commit getPendingRanges can fail if the same replica covers more than 1 pending range. I think that this is a valid situation. Consider testLeave2 in a534b2. In this case replica Full(127.0.0.1:7012,(-9,0]) covers 2 ranges - (-9,-4] and (-4,0]. If we run the same test against 3.11 (-9, 0] is represented as {{ {(-9, -4], (-4, 0]}

        }} and that possible representation would not trigger the bug in 4.0. However, I think that we should not base safety of getPendingRanges execution on the way a particular AbstractReplicationStrategy represents a range and we should allow duplicate entries while building pending RangesAtEndpoint.

      Show
        a534b2 adds tests that reproduce both bugs. bb36cd fixes the newly discovered bug ( test1Leave1Move in a534b2 ). There, the failure happens because for the same pending range we can include 2 replicas with the same endpoint and different ranges (violation of PendingRangeMaps Conflict.DUPLICATE policy). This happens because right now we include in PendingRangeMaps the entire new replica after leave for leave-affected ranges and only the pending part of it for move-affected ranges. This commit marks as pending only the missing part of the new replica. a33b03 solves the original issue. Without this commit getPendingRanges can fail if the same replica covers more than 1 pending range. I think that this is a valid situation. Consider testLeave2 in a534b2 . In this case replica Full(127.0.0.1:7012,(-9,0]) covers 2 ranges - (-9,-4] and (-4,0] . If we run the same test against 3.11 (-9, 0] is represented as {{ {(-9, -4], (-4, 0]} }} and that possible representation would not trigger the bug in 4.0. However, I think that we should not base safety of getPendingRanges execution on the way a particular AbstractReplicationStrategy represents a range and we should allow duplicate entries while building pending RangesAtEndpoint .

    Description

      Correctness depended upon the narrowing to a Set<InetAddressAndport>, which we no longer do - we maintain a collection of all Replica. Our RangesAtEndpoint collection built by getPendingRanges can as a result contain the same endpoint multiple times; and our EndpointsForToken obtained by TokenMetadata.pendingEndpointsFor may fail to be constructed, resulting in cluster-wide failures for writes to the affected token ranges for the duration of the range movement.

      Attachments

        Issue Links

          Activity

            People

              Gerrrr Alex Sorokoumov
              benedict Benedict Elliott Smith
              Alex Sorokoumov
              Marcus Eriksson, Sam Tunnicliffe
              Votes:
              1 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m