Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-16550

[SBN read] Improper cache-size for journal node may cause cluster crash

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersStop watchingWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      When we introduced SBN Read, we encountered a situation during upgrade the JournalNodes.

      Cluster Info: 
      Active: nn0
      Standby: nn1

      1. Rolling restart journal node. (related config: fs.journalnode.edit-cache-size.bytes=1G, -Xms1G, -Xmx=1G)

      2. The cluster runs for a while, edits cache usage is increasing and memory is used up.

      3. Active namenode(nn0) shutdown because of “Timed out waiting 120000ms for a quorum of nodes to respond”.

      4. Transfer nn1 to Active state.

      5. New Active namenode(nn1) also shutdown because of “Timed out waiting 120000ms for a quorum of nodes to respond” too.

      6. The cluster crashed.

       

      Related code:

      JournaledEditsCache(Configuration conf) {
        capacity = conf.getInt(DFSConfigKeys.DFS_JOURNALNODE_EDIT_CACHE_SIZE_KEY,
            DFSConfigKeys.DFS_JOURNALNODE_EDIT_CACHE_SIZE_DEFAULT);
        if (capacity > 0.9 * Runtime.getRuntime().maxMemory()) {
          Journal.LOG.warn(String.format("Cache capacity is set at %d bytes but " +
              "maximum JVM memory is only %d bytes. It is recommended that you " +
              "decrease the cache size or increase the heap size.",
              capacity, Runtime.getRuntime().maxMemory()));
        }
        Journal.LOG.info("Enabling the journaled edits cache with a capacity " +
            "of bytes: " + capacity);
        ReadWriteLock lock = new ReentrantReadWriteLock(true);
        readLock = new AutoCloseableLock(lock.readLock());
        writeLock = new AutoCloseableLock(lock.writeLock());
        initialize(INVALID_TXN_ID);
      } 

      Currently, fs.journalNode.edit-cache-size-bytes can be set to a larger size than the memory requested by the process. If fs.journalNode.edit-cache-sie.bytes > 0.9 * Runtime.getruntime().maxMemory(), only warn logs are printed during journalnode startup. This can easily be overlooked by users. However, as the cluster runs to a certain period of time, it is likely to cause the cluster to crash.

       

      NN log:

      IMO, we should not set the cache size to a fixed value, but to the ratio of maximum memory, which is 0.2 by default.
      This avoids the problem of too large cache size. In addition, users can actively adjust the heap size when they need to increase the cache size.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            tomscut Tao Li Assign to me
            tomscut Tao Li
            Votes:
            0 Vote for this issue
            Watchers:
            4 Stop watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - Not Specified
              Not Specified
              Remaining:
              Remaining Estimate - 0h
              0h
              Logged:
              Time Spent - 1h
              1h

              Slack

                Issue deployment