[OAK-9785] Tar SegmentStore can be corrupted during compaction - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 1.42.0
Fix Version/s: 1.46.0, 1.22.14
Component/s: segment-tar
Labels:
None

Description

There is a scenario where a segment store can become corrupted, leading to SegmentNotFoundExceptions with very "young" SegmentIds, i.e. in the 1-2 digit millisecond range. E.g. SegmentId age=2ms.

The scenario I observed looks as follows:

a blob is "lost" from the external blob store (presumably due to incorrect cloning of the instance, most likely only happens with unfortunate timing)
a tail revision GC run is performed (not sure if it matters that this was a tail compaction)
- the missing blob is encountered during compaction
- an exception other than an IOException (IIRC it was a IllegalArgumentException) is thrown due to the missing blob
- revision GC fails WITHOUT properly being aborted, and thus the partially written revision of the compaction run is not removed
more data is written on the instance
a full revision GC run is performed
- a referenced segment is removed due to the incorrect/confused revision data
the SegmentNotFoundException is first observed either during the remainder of the compaction run or when the respective node is requested the next time, usually during a traversal

The root cause is in AbstractCompactionStrategy, where only IOExceptions are caught.

In order to improve the robustness of the code, I think we need to catch all Throwables. Otherwise we cannot guarantee that compaction is correctly aborted.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

error.log.2022-06-09
29/Jun/22 07:55
47 kB
Julian Sedding

Issue Links

links to

GitHub Pull Request #676

GitHub Pull Request #733

Activity

People

Assignee:: Julian Sedding

Reporter:: Julian Sedding

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 26/May/22 15:27

Updated:: 10/May/23 10:49

Resolved:: 21/Sep/22 07:48