Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-17105

mistakenly purge editLogs even after it is empty in NNStorageRetentionManager

    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      What happened:

      Got IndexOutOfBoundsException after setting dfs.namenode.max.extra.edits.segments.retained to a negative value and purging old record with NNStorageRetentionManager.

      Where's the bug:

      In line 156 of NNStorageRetentionManager, the manager trims editLogs until it is under the maxExtraEditsSegmentsToRetain:

      while (editLogs.size() > maxExtraEditsSegmentsToRetain) {
            purgeLogsFrom = editLogs.get(0).getLastTxId() + 1;
            editLogs.remove(0);
      }

      However, if dfs.namenode.max.extra.edits.segments.retained is set to below 0 the size of editLogs would never be below, resulting in ultimately editLog.size()=0 and thus editLogs.get(0) is out of range.

      How to reproduce:

      (1) Set dfs.namenode.max.extra.edits.segments.retained to -1974676133
      (2) Run test: org.apache.hadoop.hdfs.server.namenode.TestNNStorageRetentionManager#testNoLogs

      Stacktrace:

      java.lang.IndexOutOfBoundsException: Index 0 out of bounds for length 0
          at java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:64)
          at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:70)
          at java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:248)
          at java.base/java.util.Objects.checkIndex(Objects.java:372)
          at java.base/java.util.ArrayList.get(ArrayList.java:459)
          at org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager.purgeOldStorage(NNStorageRetentionManager.java:157)
          at org.apache.hadoop.hdfs.server.namenode.TestNNStorageRetentionManager.runTest(TestNNStorageRetentionManager.java:299)
          at org.apache.hadoop.hdfs.server.namenode.TestNNStorageRetentionManager.testNoLogs(TestNNStorageRetentionManager.java:143)

      For an easy reproduction, run the reproduce.sh in the attachment.

      We are happy to provide a patch if this issue is confirmed.

      Attachments

        1. reproduce.sh
          0.7 kB
          ConfX

        Issue Links

          Activity

            People

              FuzzingTeam ConfX
              FuzzingTeam ConfX
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: