Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-17191

Delete operation adds a thread to collect blocks asynchronously

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.4.0
    • None
    • hdfs
    • None

    Description

      When we delete a large directory, it is time-consuming to collect the blocks in the deleted subtree. Currently, block collection is executed within a write lock. If a large directory is deleted, other RPCs may be blocked for a period of time. Asynchronous deletion of collected blocks has been implemented, we can refer to this Jira https://issues.apache.org/jira/browse/HDFS-16043.

      In fact, collecting blocks does not require locking, because after the subtree is deleted, this subtree will not be accessed by other RPCs. We can collect the deleted subtree asynchronously and without locking.
      But there may be some problems:
      1. When the parent node of the subtree is configured with quota, the quota update is not synchronous and there will be a small delay.
      2. Because the root directory always has the DirectoryWithQuotaFeature attribute, we need to update the quotaUsage of the root directory anyway. In addition, the root directory does not have an upper limit for quota configuration. I think we can ignore the delayed update of quota for the root directory.

      To solve the above problem, we can check whether all parent directories of the subtree are configured with quota. If quota is not configured, use asynchronous collection. We can also use configuration to let users decide whether to enable quota checking.

      Attachments

        Activity

          People

            zhuxiangyi Xiangyi Zhu
            zhuxiangyi Xiangyi Zhu
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: