[CASSANDRA-8731] Optimise merges involving multiple clustering columns - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Normal
Resolution: Unresolved
Fix Version/s: 5.x
Component/s: Legacy/Local Write-Read Paths
Labels:
- compaction
- performance

Description

Since the new storage format is dead in the water for the moment, we should do our best to optimise current behaviour. When merging data from multiple sstables with multiple clustering columns, currently we must incur the full costs of comparison for the entire matching prefix, and must heapify every cell in our PriorityQueue, incurring lg(N) of these costlier comparisons for every cell we merge, where N is the number of sources we're merging.

Essentially I'm proposing a trie-based merge approach as a replacement for the ManyToOne MergeIterator, wherein we treat each clustering component as a tree underwhich all Cells with a common prefix occur. We then perform a tree merge, rather than a flat merge. For byte-order fields this trie can even be a full binary-trie (although built on the fly). The advantage here is that we rapidly prune merges involving disjoint ranges, so that instead of always incurring lg(N) costs on each new record, we may often incur O(1) costs. For timeseries data, for instance, we could merge dozens of files and so long as they were non-overlapping our CPU burden would be little more than reading from a single file.

On top of this, we no longer incur any of the shared prefix repetition costs, since we compare each prefix piece-wise, and only once.

Attachments

Issue Links

is related to

CASSANDRA-11697 Improve Compaction Throughput

Open

CASSANDRA-8906 Experiment with optimizing partition merging when we can prove that some sources don't overlap

Open

CASSANDRA-8918 Optimise compaction performance for unique partition keys

Resolved

CASSANDRA-8915 Improve MergeIterator performance

Resolved

relates to

CASSANDRA-8923 Improve MergeIterator performance for binary prefix comparable data

Open

Activity

People

Assignee:: Unassigned

Reporter:: Benedict Elliott Smith

Votes:: 0 Vote for this issue

Watchers:: 12 Start watching this issue

Dates

Created:: 03/Feb/15 19:13

Updated:: 07/Mar/23 10:53