Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-18278

Add a tool to clean redundant data for native secondary index

    XMLWordPrintableJSON

Details

    • Operability
    • Normal
    • All
    • None

    Description

      As we know Cassandra' secondary index is a local secondary index , and for every data update , and the every update hit the indexed columns. The old redundant data (Stale Entries) for index table are keeped in the table only when the data are read (may be a little like read repair ).
      So there may exist some old and useless data for index table if they are not read. So we would like to support a tool that can remove the old useless data .See the picture below , we create a table with a secondary index on c1 column , then update data with same pk ,different c1 value, and we flush after every update, after that we force a major on the index table . See the sstable dump for secondary index (The dump tool for secondary index can not be used but fortunately we use the CASSANDRA-17698), and we can see the content of index sstable.
      Below are the cql and dump result.

      cqlsh> DESC ks.tb
      
      CREATE TABLE ks.tb (
          pk int PRIMARY KEY,
          c1 int
      ) WITH additional_write_policy = '99p'
          AND allow_auto_snapshot = true
          AND bloom_filter_fp_chance = 0.01
          AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
          AND cdc = false
          AND comment = ''
          AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
          AND compression = {'chunk_length_in_kb': '16', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
          AND memtable = 'default'
          AND crc_check_chance = 1.0
          AND default_time_to_live = 0
          AND extensions = {}
          AND gc_grace_seconds = 864000
          AND max_index_interval = 2048
          AND memtable_flush_period_in_ms = 0
          AND min_index_interval = 128
          AND read_repair = 'BLOCKING'
          AND speculative_retry = '99p';
      
      CREATE INDEX idx ON ks.tb (c1);
      cqlsh> INSERT INTO ks.tb(pk, c1)values (1, 1);
      cqlsh> INSERT INTO ks.tb(pk, c1)values (1, 2);
      cqlsh> INSERT INTO ks.tb(pk, c1)values (1, 3);
      cqlsh> 
      

      On the other hand we flush after every update and force a major at the end.

        bin git:(trunk) ✗ ./nodetool flush
      ➜  bin git:(trunk) ✗ ./nodetool flush
      ➜  bin git:(trunk) ✗ ./nodetool flush
      ➜  bin git:(trunk) ✗ ./nodetool compact ks tb.idx
      ➜  bin git:(trunk) ✗ ../tools/bin/sstabledump ../data/data/ks/tb-65d902b0b2bc11ed86ed81daebeca99d/.idx/nb-13-big-Data.db 
      [
        {
          "table kind" : "INDEX",
          "partition" : {
            "key" : [ "1" ],
            "position" : 0
          },
          "rows" : [
            {
              "type" : "row",
              "position" : 18,
              "clustering" : [ 1 ],
              "liveness_info" : { "tstamp" : "2023-02-23T03:21:57.638558Z" },
              "cells" : [ ]
            }
          ]
        },
        {
          "table kind" : "INDEX",
          "partition" : {
            "key" : [ "2" ],
            "position" : 29
          },
          "rows" : [
            {
              "type" : "row",
              "position" : 47,
              "clustering" : [ 1 ],
              "liveness_info" : { "tstamp" : "2023-02-23T03:22:19.834466Z" },
              "cells" : [ ]
            }
          ]
        },
        {
          "table kind" : "INDEX",
          "partition" : {
            "key" : [ "3" ],
            "position" : 61
          },
          "rows" : [
            {
              "type" : "row",
              "position" : 79,
              "clustering" : [ 1 ],
              "liveness_info" : { "tstamp" : "2023-02-23T03:22:27.532174Z" },
              "cells" : [ ]
            }
          ]
        }
      ]%       
      

      Attachments

        Activity

          People

            maxwellguo Maxwell Guo
            maxwellguo Maxwell Guo
            Maxwell Guo
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: