Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
1.3.0
Description
When a key is uploading by XcieverClientRatis, and some datanode becomes unavailable, it is expected that client should request new pipeline to retry upload.
In fact, before that client tries to repeat commit check with MAJORITY_COMMITTED replication level, which cannot be successful as at that moment pipeline is already closed.
XceiverClientRatis has method watchForCommit(long index), which contains exception check
if (t instanceof GroupMismatchException) { throw e; }
GroupMismatchException throws by Ratis client exactly when some datanode is not available and further key upload is not available for current pipeline.
But this check does not work as
Throwable t = HddsClientUtils.checkForException(e);
does not unwrap exception completely.
The idea is fix lookup of nested exceptions to find proper one. This improves failover latency by 15 seconds approximately.
Attachments
Issue Links
- is caused by
-
HDDS-2280 HddsUtils#CheckForException should not return null in case the ratis exception cause is not set
- Resolved
- is related to
-
HDDS-10788 Intermittent failure in testWatchForCommitForRetryfailure
- In Progress
-
HDDS-1395 Key write fails with BlockOutputStream has been closed exception
- Resolved
-
HDDS-9823 Pipeline failure should trigger heartbeat immediately
- Resolved
- links to