Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-14262

View update sent multiple times during range movement

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Low
    • Resolution: Unresolved
    • None
    • None

    Description

      This issue is about updating a base table with materialized views while token-ranges are being moved, i.e., while a node is being added or removed from the cluster (this is a long process because the data needs to be streamed to its new owning node).

      During this process, each view-mutation we want to write to a view table may have an additional "pending node" (or several of them) - another node (or nodes) which will hold this view mutation, and we need to send the view mutations to these new nodes too. This code existed until CASSANDRA-13069, when it was accidentally removed, and returned in CASSANDRA-14251.

      However, the current code, in mutateMV(), has each of the RF (e.g., 3) base replicas send the view mutation to the the same pending node. This is of course redundant, and reduces write throughput while the streaming is performed.

      I suggested (based on an idea by shlomi_livne) that it may be enough for only the single node which will be paired (when the range movement completes) with the pending node to send it the update. pauloricardomg replied (see https://lists.apache.org/thread.html/12c78582a3f709ca33a45e5fa6121148b1b1ad9c9b290d1a21e4409b@%3Cdev.cassandra.apache.org%3E ) that it appears that such an optimization would work in the common case of single movements but will not work in rarer more complex cases (I did not fully understand the details, check out the above link for the details).

      I believe there's another problem with the current code, which is of correctness: If any view replica ends up with two different view rows for the same partition key, such a mistake cannot currently be fixed (see CASSANDRA-10346). But if we have different base replicas with two different values (a consistency an ordinary base repair could fix, if we ran it) and both of them send their update to the same pending view replica, this view replica will now have two rows, one of them wrong (and cannot currently be repaired).

       

       

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            nyh Nadav Har'El
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated: