Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-17307

docker-compose.yaml sets namenode directory wrong causing datanode failures on restart

Add voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      Restarting existing services using the docker-compose.yaml, causes the datanode to crash after a few seconds.

      How to reproduce:

      $ docker-compose up -d # everything starts ok
      $ docker-compose stop  # stop services without removing containers
      $ docker-compose up -d # everything starts, but datanode crashes after a few seconds

      The log produced by the datanode suggests the issue is due to a mismatch in the clusterIDs of the namenode and the datanode:

      datanode_1         | 2023-12-28 11:17:15 WARN  Storage:420 - Failed to add storage directory [DISK]file:/tmp/hadoop-hadoop/dfs/data
      datanode_1         | java.io.IOException: Incompatible clusterIDs in /tmp/hadoop-hadoop/dfs/data: namenode clusterID = CID-250bae07-6a8a-45ce-84bb-8828b37b10b7; datanode clusterID = CID-2c1c7105-7fdf-4a19-8ef8-7cb763e5b701 

      After some troubleshooting I found out the namenode is not reusing the clusterID of the previous run because it cannot find it in the directory set by ENSURE_NAMENODE_DIR=/tmp/hadoop-root/dfs/name. This is due to a change of the default user of the namenode, which is now "hadoop",  so the namenode is actually writing these information to /tmp/hadoop-hadoop/dfs/name.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            mattlectic Matthew Rossi

            Dates

              Created:
              Updated:

              Slack

                Issue deployment