Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-9826

Fix exception handling if one Datanode is not available (Ratis)

    XMLWordPrintableJSON

Details

    Description

      When a key is uploading by XcieverClientRatis, and some datanode becomes unavailable, it is expected that client should request new pipeline to retry upload.

      In fact, before that client tries to repeat commit check with MAJORITY_COMMITTED replication level, which cannot be successful as at that moment pipeline is already closed.

      XceiverClientRatis has method watchForCommit(long index), which contains exception check

       

      if (t instanceof GroupMismatchException) {
        throw e;
      }
      

      GroupMismatchException throws by Ratis client exactly when some datanode is not available and further key upload is not available for current pipeline.

      But this check does not work as 

      Throwable t = HddsClientUtils.checkForException(e);

       does not unwrap exception completely.

      The idea is fix lookup of nested exceptions to find proper one. This improves failover latency by 15 seconds approximately.

      Attachments

        Issue Links

          Activity

            People

              ibrusentsev Ivan Brusentsev
              ibrusentsev Ivan Brusentsev
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: