Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-17163

ERROR Log Message when upgrading from 2.10.2 to 3.3.6

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.10.2
    • None
    • namenode
    • None

    Description

      When I performed the full-stop upgrade from 2.10.2 to 3.3.6. I noticed the following error message:

      2023-08-17 10:43:11,665 ERROR org.apache.hadoop.hdfs.server.common.Storage: Error reported on storage directory Storage Directory /tmp/hadoop-root/dfs/namesecondary

      2023-08-19 05:21:41,544 INFO org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: Image file /tmp/hadoop-root/dfs/namesecondary/current/fsimage.ckpt_0000000000000000188 of size 2881 bytes saved in 0 seconds .
      2023-08-19 05:21:41,646 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: RECEIVED SIGNAL 15: SIGTERM
      2023-08-19 05:21:41,649 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: FSImageSaver clean checkpoint: txid = 188 when meet shutdown.
      2023-08-19 05:21:41,650 INFO org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: SHUTDOWN_MSG:
      /************************************************************
      SHUTDOWN_MSG: Shutting down SecondaryNameNode at 555840e97c97/192.168.239.3
      ************************************************************/
      2023-08-19 05:21:41,714 WARN org.apache.hadoop.hdfs.server.namenode.FSImage: Unable to rename checkpoint in Storage Directory /tmp/hadoop-root/dfs/namesecondary
      java.io.IOException: renaming  /tmp/hadoop-root/dfs/namesecondary/current/fsimage.ckpt_0000000000000000188 to /tmp/hadoop-root/dfs/namesecondary/current/fsimage_0000000000000000188 FAILED
              at org.apache.hadoop.hdfs.server.namenode.FSImage.renameImageFileInDir(FSImage.java:1329)
              at org.apache.hadoop.hdfs.server.namenode.FSImage.renameCheckpoint(FSImage.java:1263)
              at org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImageInAllDirs(FSImage.java:1224)
              at org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImageInAllDirs(FSImage.java:1172)
              at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:1105)
              at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:563)
              at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:360)
              at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$1.run(SecondaryNameNode.java:325)
              at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:481)
              at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:321)
              at java.lang.Thread.run(Thread.java:750)
      2023-08-19 05:21:41,716 ERROR org.apache.hadoop.hdfs.server.common.Storage: Error reported on storage directory Storage Directory /tmp/hadoop-root/dfs/namesecondary
      2023-08-19 05:21:41,716 WARN org.apache.hadoop.hdfs.server.common.Storage: About to remove corresponding storage: /tmp/hadoop-root/dfs/namesecondary

      The cluster I am using is four nodes: 1 NN, 1 SNN, 2 DN. The upgrade order is: (1) Stop SNN (2) Stop NN (3) Stop DN1 and DN2. The error message occurs at SNN when it's stopping.

      The command sequence I was executing and the configurations are appended. I tried to reproduce it with the same command sequence, but it cannot be reproduced (I repeatedly execute the command sequence + upgrade) two thousand times. It might require some special timing constraints. I am not sure whether this could impact the data integrity. 

      == Command Sequence ==

      // Start up cluster (2.10.2), 4 nodes
      bin/hdfs dfsadmin -safemode enter
      bin/hdfs dfsadmin -rollingUpgrade prepare
      bin/hdfs dfsadmin -safemode leave
      
      // Execute commands
       // Execute commands
      dfs -mkdir /fHPXyTkv
      dfs -put -f -p  /tmp/XPkJEWYY/kPCH /fHPXyTkv/
      dfs -put  -p -d /tmp/XPkJEWYY/HdM /fHPXyTkv/kPCH/xoflDHK/lJ
      dfsadmin -report -live  -decommissioning
      dfsadmin -setSpaceQuota 1 -storageType ARCHIVE /fHPXyTkv/kPCH/xoflDHK/Ykc/AP
      dfs -mkdir /fHPXyTkv/kPCH/xoflDHK/lJ/ozidF
      dfs -mv /fHPXyTkv/kPCH/xoflDHK/Ykc /fHPXyTkv/kPCH/xoflDHK/lJ
      dfs -mv /fHPXyTkv/kPCH/xoflDHK/lJ/AP /fHPXyTkv/kPCH/xoflDHK/eaSvvJyzZT/lL
      dfsadmin -report  -dead -decommissioning -enteringmaintenance
      dfsadmin -refreshNodes
      dfs -mkdir /fHPXyTkv/kPCH/xoflDHK/lJ/ozidF/SpdyMzpNXmVEL
      dfs -setacl  -k -m acl /kPCH/xoflDHK/lJ/ozidF --set acl2 /kPCH/xoflDHK/eaSvvJyzZT/lL
      dfsadmin -refreshNodes
      dfsadmin -setSpaceQuota 85 -storageType PROVIDED /fHPXyTkv/kPCH/mduNyG
      dfsadmin -saveNamespace
      dfs -put -f -p -d /tmp/XPkJEWYY/kPCH /fHPXyTkv/kPCH
      dfsadmin -saveNamespace
      dfs -mv /fHPXyTkv/kPCH/mduNyG/VZc /fHPXyTkv/kPCH/xoflDHK/Ykc/AP
      dfsadmin -setSpaceQuota 85 -storageType PROVIDED /fHPXyTkv/kPCH/xoflDHK/eaSvvJyzZT/lL
      dfs -put -f -p -d /tmp/XPkJEWYY/kPCH /fHPXyTkv/kPCH/kPCH/xoflDHK/Ykc
      dfsadmin -report  -dead  -enteringmaintenance -inmaintenance
      dfsadmin -setSpaceQuota 1 -storageType SSD /fHPXyTkv/kPCH/xoflDHK/JgKqDE
      dfs -put -f   /tmp/XPkJEWYY/HdM /fHPXyTkv/kPCH/kPCH/xoflDHK/Ykc/kPCH/mduNyG/VZc
      dfsadmin -rollEdits
      dfs -cat  /fHPXyTkv/kPCH/kPCH/mduNyG/YPZ
      dfs -ls  -d  -q  -S -r  /fHPXyTkv/kPCH
      dfs -ls  -d  -q -t -S   /fHPXyTkv/kPCH/kPCH/xoflDHK/Ykc/kPCH/xoflDHK/Ykc/AP
      dfs -cat  /fHPXyTkv/kPCH/xoflDHK/lJ/HdM
      dfs -cat -ignoreCrc /fHPXyTkv/kPCH/mduNyG/YPZ
      dfs -cat -ignoreCrc /fHPXyTkv/kPCH/kPCH/xoflDHK/Ykc/kPCH/mduNyG/YPZ
      dfs -ls -C  -h -q   -r  /fHPXyTkv/kPCH/kPCH/xoflDHK/Ykc/AP
      dfs -cat -ignoreCrc /fHPXyTkv/kPCH/kPCH/xoflDHK/Ykc/eJBcmWE
      dfs -count -h -v -t DISK /fHPXyTkv/kPCH/kPCH/xoflDHK
      dfs -count -q -h -x -u /fHPXyTkv/kPCH/xoflDHK/lJ
      dfs -count -q /fHPXyTkv/kPCH/xoflDHK
      dfs -cat  /fHPXyTkv/kPCH/kPCH/xoflDHK/Ykc/eJBcmWE
      dfs -ls    -q -t    /fHPXyTkv/kPCH/kPCH
      dfs -cat  /fHPXyTkv/kPCH/mduNyG/YPZ
      dfs -cat -ignoreCrc /fHPXyTkv/kPCH/kPCH/xoflDHK/Ykc/kPCH/mduNyG/VZc/HdM    
      // stop SNN
      // stop NN
      // stop DN1&DN2

      Attachments

        1. XPkJEWYY.tar.gz
          67 kB
          Ke Han
        2. log.tar.gz
          61 kB
          Ke Han
        3. hdfs-site.xml
          0.6 kB
          Ke Han
        4. core-site.xml
          0.9 kB
          Ke Han

        Activity

          People

            Unassigned Unassigned
            kehan5800 Ke Han
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: