Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-16659

JournalNode should throw NewerTxnIdException if SinceTxId is bigger than HighestWrittenTxId

    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      JournalNode should throw `CacheMissException` if `sinceTxId` is bigger than `highestWrittenTxId` during handling `getJournaledEdits` rpc from NNs.
      Current logic may cause in-progress EditlogTailer cannot replay any Edits from JournalNodes in some corner cases, resulting in ObserverNameNode cannot handle requests from clients.

      Suppose there are 3 journalNodes, JN0 ~ JN1.

      • JN0 has some abnormal cases when Active Namenode is syncing 10 Edits with first txid 11
      • NameNode just ignore the abnormal JN0 and continue to sync Edits to Journal 1 and 2
      • JN0 backed to health
      • NameNode continue sync 10 Edits with first txid 21.
      • At this point, there are no Edits 11 ~ 30 in the cache of JN0
      • Observer NameNode try to select EditLogInputStream through `getJournaledEdits` with since txId 21
      • Journal 2 has some abnormal cases and caused a slow response

      The expected result is: Response should contain 20 Edits from txId 21 to txId 30 from JN1 and JN2. Because Active NameNode successfully write these Edits to JN1 and JN2 and failed write these edits to JN0.

      But in the current implementation, the response is [Response(0) from JN0, Response(10) from JN1], because there are some abnormal cases in JN2, such as GC, bad network, cause a slow response. So the `maxAllowedTxns` will be 0, NameNode will not replay any Edits.

      As above, the root case is that JournalNode should throw Miss Cache Exception when `sinceTxid` is more than `highestWrittenTxId`.

      And the bug code as blew:

      if (sinceTxId > getHighestWrittenTxId()) {
          // Requested edits that don't exist yet; short-circuit the cache here
          metrics.rpcEmptyResponses.incr();
          return GetJournaledEditsResponseProto.newBuilder().setTxnCount(0).build(); 
      }
      

      Attachments

        Issue Links

          Activity

            People

              xuzq_zander ZanderXu
              xuzq_zander ZanderXu
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h
                  1h