Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-10967

Add configuration for BlockPlacementPolicy to avoid near-full DataNodes

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • None
    • None
    • namenode

    Description

      Large production clusters are likely to have heterogeneous nodes in terms of storage capacity, memory, and CPU cores. It is not always possible to proportionally ingest data into DataNodes based on their remaining storage capacity. Therefore it's possible for a subset of DataNodes to be much closer to full capacity than the rest.

      This heterogeneity is most likely rack-by-rack – i.e. m whole racks of low-storage nodes and n whole racks of high-storage nodes. So It'd be very useful if we can lower the chance for those near-full DataNodes to become destinations for the 2nd and 3rd replicas.

      Attachments

        1. HDFS-10967.00.patch
          4 kB
          Zhe Zhang
        2. HDFS-10967.01.patch
          11 kB
          Zhe Zhang
        3. HDFS-10967.02.patch
          27 kB
          Zhe Zhang
        4. HDFS-10967.03.patch
          26 kB
          Zhe Zhang

        Issue Links

          Activity

            People

              zhz Zhe Zhang
              zhz Zhe Zhang
              Votes:
              1 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

                Created:
                Updated: