Details
-
New Feature
-
Status: Resolved
-
Minor
-
Resolution: Won't Fix
-
None
-
None
Description
Food for thought: try to base the compaction estimation on a diff between the latest compacted state and the current state.
Pros
- estimation duration would be proportional to number of changes on the current head state
- using the size on disk as a reference, we could actually stop the estimation early when we go over the gc threshold.
- data collected during this diff could in theory be passed as input to the compactor so it could focus on compacting a specific subtree
Cons
- need to keep a reference to a previous compacted state. post-startup and pre-compaction this might prove difficult (except maybe if we only persist the revision similar to what the async indexer is doing currently)
- coming up with a threshold for running compaction might prove difficult
- diff might be costly, but still cheaper than the current full diff
Attachments
Issue Links
- relates to
-
OAK-4293 Refactor / rework compaction gain estimation
- Closed