Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-9597

DTCS should consider file SIZE in addition to time windowing

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Low
    • Resolution: Duplicate
    • None
    • None

    Description

      DTCS seems to work well for the typical use case - writing data in perfect time order, compacting recent files, and ignoring older files.

      However, there are "normal" operational actions where DTCS will fall behind and is unlikely to recover.

      An example of this is streaming operations (for example, bootstrap or loading data into a cluster using sstableloader), where lots (tens of thousands) of very small sstables can be created spanning multiple time buckets. In these case, even if max_sstable_age_days is extended to allow the older incoming files to be compacted, the selection logic is likely to re-compact large files with fewer small files over and over, rather than prioritizing selection of max_threshold smallest files to decrease the number of candidate sstables as quickly as possible.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jjirsa Jeff Jirsa
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: