Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-17178

BootstrapStandby needs to handle RollingUpgrade

    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      During rollingUpgrade, bootstrapStandby will fail with an exception due to different NameNodeLayoutVersions. We can ignore this safely during RollingUpgrade because different NameNodeLayoutVersions are expected.

      • NameNodes will not be able to recover with BootstrapStandby if they go through destructive repair before the rollingUpgrade has been finalized.

      Error during BootstrapStandby before change:

      =====================================================
      About to bootstrap Standby ID nn2 from:
                 Nameservice ID: MTPrime-MWHE01-0
              Other Namenode ID: nn1
        Other NN's HTTP address: https://MWHEEEAP002D9A2:81
        Other NN's IPC  address: MWHEEEAP002D9A2.ap.gbl/10.59.208.18:8020
                   Namespace ID: 895912530
                  Block pool ID: BP-1556042256-10.99.154.61-1663325602669
                     Cluster ID: MWHE01
                 Layout version: -64
             isUpgradeFinalized: true
      =====================================================
      2023-08-28T19:35:06,940 ERROR [main] namenode.NameNode: Failed to start namenode.
      java.io.IOException: java.lang.RuntimeException: org.apache.hadoop.hdfs.server.namenode.TransferFsImage$HttpGetFailedException: Image transfer servlet at https://MWHEEEAP002D9A2:81/imagetransfer?getimage=1&txid=25683470&storageInfo=-64:895912530:1663325602669:MWHE01&bootstrapstandby=true failed with status code 403
      Response message:
      This namenode has storage info -63:895912530:1663325602669:MWHE01 but the secondary expected -64:895912530:1663325602669:MWHE01
              at org.apache.hadoop.hdfs.server.namenode.ha.BootstrapStandby.run(BootstrapStandby.java:583) ~[hadoop-hdfs-2.9.2-MT-SNAPSHOT.jar:?]
              at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1717) ~[hadoop-hdfs-2.9.2-MT-SNAPSHOT.jar:?]
              at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1819) [hadoop-hdfs-2.9.2-MT-SNAPSHOT.jar:?]
      Caused by: java.lang.RuntimeException: org.apache.hadoop.hdfs.server.namenode.TransferFsImage$HttpGetFailedException: Image transfer servlet at https://MWHEEEAP002D9A2:81

      This is caused because the namespaceInfo sent from the proxy node does not include the effective layout version, which causes BootstrapStandby to send a request with a storageinfo param using the service layout version. This causes the proxy node to refuse the request, because it compares the storageinfo param against its storage info, which uses the effective layout version, not the service layout version. 

      To fix this we can modify the proxy.versionRequest() call stack to set the layout version using the effective layout version on the proxy node. We can then add logic to BootstrapStandby to properly handle the case where the proxy node is in rolling upgrade.

      Attachments

        Activity

          People

            dannytbecker Danny Becker
            dannytbecker Danny Becker
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: