Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-14941

Expired secondary index sstables are not promptly discarded under TWCS

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Normal
    • Resolution: Unresolved
    • None
    • Feature/2i Index
    • None
    • Normal

    Description

      We have a table in a cluster running 3.0.17 storing roughly time-series data using TWCS with a secondary index. We've noticed that while expired sstables for the table are discarded mostly when we expect them to be, the expired sstables for the secondary index would linger for weeks longer than expected – essentially indefinitely. Eventually the sstables would fill disks, which would require manual steps (deleting ancient index sstables) to address. We verified with sstableexpiredblockers that there wasn't anything on disk blocking the expired sstables from being dropped, so this looks like a bug.

      Through some debugging, we traced the problem to the index's memtables, which were consistently (except just after node restarts) reporting a minimum timestamp from September 2015 – much older than any of our live data – which causes CompactionController.getFullyExpiredSSTables() to consistently return an empty set. The reason that the index sstables report this minimum timestamp is because of how index updates are created, using PartitionUpdate.singleRowUpdate():

          public static PartitionUpdate singleRowUpdate(CFMetaData metadata, DecoratedKey key, Row row, Row staticRow)
          {
              MutableDeletionInfo deletionInfo = MutableDeletionInfo.live();
              Holder holder = new Holder(
                  new PartitionColumns(
                      staticRow == null ? Columns.NONE : Columns.from(staticRow.columns()),
                      row == null ? Columns.NONE : Columns.from(row.columns())
                  ),
                  row == null ? BTree.empty() : BTree.singleton(row),
                  deletionInfo,
                  staticRow == null ? Rows.EMPTY_STATIC_ROW : staticRow,
                  EncodingStats.NO_STATS
              );
              return new PartitionUpdate(metadata, key, holder, deletionInfo, false);
          }
      

      The use of EncodingStats.NO_STATS makes it appear as though the earliest timestamp in the resulting PartitionUpdate is from September 2015. That timestamp becomes the minimum for the memtable.

      Modifying this version of PartitionUpdate.singleRowUpdate() to:

          public static PartitionUpdate singleRowUpdate(CFMetaData metadata, DecoratedKey key, Row row, Row staticRow)
          {
              MutableDeletionInfo deletionInfo = MutableDeletionInfo.live();
              staticRow = (staticRow == null ? Rows.EMPTY_STATIC_ROW : staticRow);
              EncodingStats stats = EncodingStats.Collector.collect(staticRow,
                                                                    (row == null ?
                                                                     Collections.emptyIterator() :
                                                                     Iterators.singletonIterator(row)),
                                                                    deletionInfo);
              Holder holder = new Holder(
                  new PartitionColumns(
                      staticRow == Rows.EMPTY_STATIC_ROW ? Columns.NONE : Columns.from(staticRow.columns()),
                      row == null ? Columns.NONE : Columns.from(row.columns())
                  ),
                  row == null ? BTree.empty() : BTree.singleton(row),
                  deletionInfo,
                  staticRow,
                  stats
              );
              return new PartitionUpdate(metadata, key, holder, deletionInfo, false);
          }
      

      (i.e., computing an EncodingStats from the contents of the update) seems to fix the problem. However, we're not certain whether A) there's a functional reason the method was using EncodingStats.NO_STATS previously or B) whether the EncodingStats the revised version creates is correct (in particular, the use of deletionInfo feels a little suspect). We're also not sure whether there's a more appropriate fix (e.g., changing how the memtables compute the minimum timestamp, particularly in the NO_STATS case).

      Attachments

        Activity

          People

            marcuse Marcus Eriksson
            sklock Samuel Klock
            Marcus Eriksson
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated: