Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-18656

Secondary indexes either violate consistency or become unavailable when post-streaming index builds fail

    XMLWordPrintableJSON

Details

    Description

      Back in 2015, we identified in CASSANDRA-10130 a case where failures in 2i builds after SSTable streaming could leave indexes in a partially built state, even after a restart, requiring manual operator intervention. There, and in CASSANDRA-13725, we made an attempt to remedy this situation, ensuring that indexes would at least be rebuilt on restart after this kind of failure. However, there are some difficulties the solution there does not address.

      Let's look at a simple example...

      Suppose an SSTable has been streamed to a node, and that node arrives in CassandraStreamReceiver#finished(). We'll call finishTransaction() to make the presence of the new SSTables durable, and then we'll call ColumnFamilyStore#addSStables(), which add the table to the Tracker, making it available for reads. We then notify listeners about the new SSTable, among them the SecondaryIndexManager, which will do a blocking index build for the new SSTable. Conceptually, at this point, we already have a problem (if a transient one), as there are live SSTables that have not been indexed.

      What if the 2i build fails, though? Let's assume it fails because of a disorderly (or orderly!) node shutdown. Some index implementations (SASI, SAI) might be able to rebuild incrementally, but the legacy 2i has no way of doing this right now. A full index rebuild on a large table could take a very long time (days, weeks, etc.) and is ultimately not a viable way to proceed. Let's say we were able to build incrementally though, and we had an SAI index that did exactly this on node restart. We would still have a gap in availability, because on startup, ColumnFamilyStore (see constructor) does not block on its calls to SecondaryIndexManager#addIndex(), which, via createIndex() actuate the index building process.

      Of course, SAI implements a notion of "queryability" that would quickly take the node out of rotation for queries across the cluster. Once its initialization task runs on restart, the indexes in question would immediately be marked non-queryable. SAI builds incrementally, and might be able to block startup to do so in this case. Legacy 2i cannot reasonably do this though.

      Attachments

        Issue Links

          Activity

            People

              maedhroz Caleb Rackliffe
              maedhroz Caleb Rackliffe
              Caleb Rackliffe
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m