Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-17048

FSNamesystem.delete() maybe cause data residue when active namenode crash or shutdown

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • None
    • None
    • hdfs
    • None
    • hdfs3.3

    Description

      Consider the following scenario:

      (1) User delete a hdfs dir with many blocks.

      (2) Then ative Namenode is crash or shutdown or failover to standby Namenode  by administrator

      (3) This may result in residual data

       

      FSNamesystem.delete() will

      (1)delete dir first

      (2)add toRemovedBlocks into markedDeleteQueue. 

      (3) MarkedDeleteBlockScrubber Thread will consumer the markedDeleteQueue and delete blocks.

      If the active namenode crash, the blocks in markedDeleteQueue will be lost and never be deleted. And the block cloud not find via hdfs fsck command. But it is alive in datanode disk.

       

      Thus , 

      SummaryA =  hdfs dfs -du -s / 

      SummaryB =sum( datanode report dfsused)

      SummaryA < SummaryB

       

      This may be unavoidable.  But is there any way to find out the blocks that should be deleted and clean it ?

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            liuguanghua liuguanghua
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: