Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-16989

Large scale block transfer causes too many excess blocks

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • 3.4.0, 3.3.5
    • None
    • namenode

    Description

      Recently, we change the replication factor of a directory which has 1.6PB  from 2 to 3.  There are 76 million  blocks in this directory. After we execute the setrep cmd, the active namenode prints lots of logs like below:

      PendingReconstructionMonitor timed out blk_xxxx_260285131

      and many datanodes prints lots of duplicated logs like below: 

      2023-04-21 13:58:17,627 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(1.1.1.1:50010, datanodeUuid=f3081eac-983f-4c3f-99c8-e4830640ee90, infoPort=50075, infoSecurePort=0, ipcPort=8010, storageInfo=lv=-57;cid=yj-hdfs2;nsid=1882889931;c=1667291826362) Starting thread to transfer BP-578784987-x.x.x.x-1667291826362:blk_1333463885_260285131 to 2.2.2.2:50010 
      2023-04-21 14:21:21,296 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DataTransfer, at 1.1.1.1:50010: Transmitted BP-578784987-x.x.x.x-1667291826362:blk_1333463885_260285131 (numBytes=524384907) to /2.2.2.2:50010
      14:34:19,134 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(1.1.1.1:50010, datanodeUuid=f3081eac-983f-4c3f-99c8-e4830640ee90, infoPort=50075, infoSecurePort=0, ipcPort=8010, storageInfo=lv=-57;cid=yj-hdfs2;nsid=1882889931;c=1667291826362) Starting thread to transfer BP-578784987-x.x.x.x-1667291826362:blk_1333463885_260285131 to 2.2.2.2:50010
      2023-04-21 14:37:58,207 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(1.1.1.1:50010, datanodeUuid=f3081eac-983f-4c3f-99c8-e4830640ee90, infoPort=50075, infoSecurePort=0, ipcPort=8010, storageInfo=lv=-57;cid=yj-hdfs2;nsid=1882889931;c=1667291826362) Starting thread to transfer BP-578784987-x.x.x.x-1667291826362:blk_1333463885_260285131 to 2.2.2.2:50010
      14:40:46,817 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(1.1.1.1:50010, datanodeUuid=f3081eac-983f-4c3f-99c8-e4830640ee90, infoPort=50075, infoSecurePort=0, ipcPort=8010, storageInfo=lv=-57;cid=yj-hdfs2;nsid=1882889931;c=1667291826362) Starting thread to transfer BP-578784987-x.x.x.x-1667291826362:blk_1333463885_260285131 to 2.2.2.2:50010
       

      It is funny that an same block blk_1333463885_260285131 was transfer multiple times even though it has been transmitted successfully. the excess transfer request will trigger lots of ReplicaAlreadyExistsException in target datanode, because the replica has already been transmitted and the state of the replica is FINALIZED.

       

      The reason is in   RedundancyMonitor#processPendingReconstructions and BlcokManager#validateReconstructionWork.

      1、RedundancyMonitor#computeDatanodeWork() generates transfer tasks using neededReconstruction and addTaskToDatanode, and put the tasks into pendingReconstruction.

      2、We set dfs.namenode.replication.work.multiplier.per.iteration = 200,  the specific cluster has 400 datanodes. So, RedundancyMonitor may generate 80K block transfer tasks per iteration at most.  After dfs.namenode.reconstruction.pending.timeout-sec (5min), the requests in  pendingReconstruction will time-out, and  PendingReconstructionMonitor thread will put the time-out requests into timedOutItems.

      3、RedundancyMonitor#processPendingReconstructions() will put the requests in timedOutItems into neededReconstruction again.

      4、TimeUnit.MILLISECONDS.sleep(redundancyRecheckIntervalMs);

      5、next iteration of while loop,RedundancyMonitor#computeDatanodeWork() will generates transfer tasks using neededReconstruction again,  here will generate repeated task or different target node task(due to the chooseTarget method).

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              zhanghaobo farmmamba
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: