Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-28216

HDFS erasure coding support for table data dirs

    XMLWordPrintableJSON

Details

    • Hide
      If you use hadoop3, managing the erasure coding policy of a table's data directory is now possible with a new table descriptor setting ERASURE_CODING_POLICY. The policy you set must be available and enabled in hdfs, and hbase will validate that your cluster topology is sufficient to support that policy. After setting the policy, you must major compact the table for the change to take effect. Attempting to use this feature with hadoop2 will fail a validation check prior to making any changes.
      Show
      If you use hadoop3, managing the erasure coding policy of a table's data directory is now possible with a new table descriptor setting ERASURE_CODING_POLICY. The policy you set must be available and enabled in hdfs, and hbase will validate that your cluster topology is sufficient to support that policy. After setting the policy, you must major compact the table for the change to take effect. Attempting to use this feature with hadoop2 will fail a validation check prior to making any changes.

    Description

      Erasure coding (EC) is a hadoop-3 feature which can drastically reduce storage requirements, at the expense of locality. At my company we have a few hbase clusters which are extremely data dense and take mostly write traffic, fewer reads (cold data). We'd like to reduce the cost of these clusters, and EC is a great way to do that since it can reduce replication related storage costs by 50%.

      It's possible to enable EC policies on sub directories of HDFS. One can manually set this with hdfs ec -setPolicy -path /hbase/data/default/usertable -policy xxxx. This can work without any hbase support.

      One problem with that is a lack of visibility by operators into which tables might have EC enabled. I think this is where HBase can help. Here's my proposal:

      • Add a new TableDescriptor and ColumnDescriptor field ERASURE_CODING_POLICY
      • In ModifyTableProcedure preflightChecks, if ERASURE_CODING_POLICY is set, verify that the requested policy is available and enabled via DistributedFileSystem.
        getErasureCodingPolicies().
      • During ModifyTableProcedure, add a new state for MODIFY_TABLE_SYNC_ERASURE_CODING_POLICY.
        • When adding or changing a policy, use DistributedFileSystem.
          setErasureCodingPolicy to sync it for the data and archive dir of that table (or column in table)
        • When removing the property or setting it to empty, use DistributedFileSystem.
          unsetErasureCodingPolicy to remove it from the data and archive dir.

      Since this new API is in hadoop-3 only, we'll need to add a reflection wrapper class for managing the calls and verifying that the API is available. We'll similarly do that API check in preflightChecks.

      Attachments

        Issue Links

          Activity

            People

              bbeaudreault Bryan Beaudreault
              bbeaudreault Bryan Beaudreault
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: