Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-14854

Create improved decommission monitor implementation

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.3.0
    • Fix Version/s: 3.3.0
    • Component/s: namenode
    • Labels:
      None
    • Target Version/s:

      Description

      In HDFS-13157, we discovered a series of problems with the current decommission monitor implementation, such as:

      • Blocks are replicated sequentially disk by disk and node by node, and hence the load is not spread well across the cluster
      • Adding a node for decommission can cause the namenode write lock to be held for a long time.
      • Decommissioning nodes floods the replication queue and under replicated blocks from a future node or disk failure may way for a long time before they are replicated.
      • Blocks pending replication are checked many times under a write lock before they are sufficiently replicate, wasting resources

      In this Jira I propose to create a new implementation of the decommission monitor that resolves these issues. As it will be difficult to prove one implementation is better than another, the new implementation can be enabled or disabled giving the option of the existing implementation or the new one.

      I will attach a pdf with some more details on the design and then a version 1 patch shortly.

        Attachments

        1. 012_to_013_changes.diff
          6 kB
          Stephen O'Donnell
        2. Decommission_Monitor_V2_001.pdf
          81 kB
          Stephen O'Donnell
        3. HDFS-14854.001.patch
          37 kB
          Stephen O'Donnell
        4. HDFS-14854.002.patch
          37 kB
          Stephen O'Donnell
        5. HDFS-14854.003.patch
          53 kB
          Stephen O'Donnell
        6. HDFS-14854.004.patch
          55 kB
          Stephen O'Donnell
        7. HDFS-14854.005.patch
          94 kB
          Stephen O'Donnell
        8. HDFS-14854.006.patch
          94 kB
          Stephen O'Donnell
        9. HDFS-14854.007.patch
          94 kB
          Stephen O'Donnell
        10. HDFS-14854.008.patch
          97 kB
          Stephen O'Donnell
        11. HDFS-14854.009.patch
          96 kB
          Stephen O'Donnell
        12. HDFS-14854.010.patch
          96 kB
          Stephen O'Donnell
        13. HDFS-14854.011.patch
          97 kB
          Stephen O'Donnell
        14. HDFS-14854.012.patch
          100 kB
          Stephen O'Donnell
        15. HDFS-14854.013.patch
          101 kB
          Stephen O'Donnell
        16. HDFS-14854.014.patch
          102 kB
          Stephen O'Donnell

          Issue Links

            Activity

              People

              • Assignee:
                sodonnell Stephen O'Donnell
                Reporter:
                sodonnell Stephen O'Donnell
              • Votes:
                1 Vote for this issue
                Watchers:
                19 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: