[OAK-3362] Estimate compaction based on diff to previous compacted head state - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Minor
Resolution: Won't Fix
Affects Version/s: None
Fix Version/s: None
Component/s: segment-tar
Labels:
- compaction
- gc

Epic Link:
SegmentMK revision GC

Description

Food for thought: try to base the compaction estimation on a diff between the latest compacted state and the current state.

Pros

estimation duration would be proportional to number of changes on the current head state
using the size on disk as a reference, we could actually stop the estimation early when we go over the gc threshold.
data collected during this diff could in theory be passed as input to the compactor so it could focus on compacting a specific subtree

Cons

need to keep a reference to a previous compacted state. post-startup and pre-compaction this might prove difficult (except maybe if we only persist the revision similar to what the async indexer is doing currently)
coming up with a threshold for running compaction might prove difficult
diff might be costly, but still cheaper than the current full diff

Attachments

Issue Links

relates to

OAK-4293 Refactor / rework compaction gain estimation

Closed

Activity

People

Assignee:: Alex Deparvu

Reporter:: Alex Deparvu

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 07/Sep/15 12:20

Updated:: 30/Jun/16 13:04

Resolved:: 30/Jun/16 13:04