Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-21879

Read HFile's block to ByteBuffer directly instead of to byte for reducing young gc purpose

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 3.0.0-alpha-1, 2.3.0
    • None
    • None
    • Reviewed
    • Hide
      Before this issue, read path was 100% offheap when block is in the BucketCache. But if a cache miss, then the RS needs to read the block via an on-heap API which causes high young-GC pressure.

      This issue adds reading the block via offheap even if reading the block from filesystem directly. It requires hadoop version(>=2.9.3) but can also work with older hadoop versions (all works but we continue to read block onheap). It also requires HBASE-21946 which is not yet in place as of this writing/hbase-2.3.0.

      We have written a careful doc about the implementation, performance and practice here: https://docs.google.com/document/d/1xSy9axGxafoH-Qc17zbD2Bd--rWjjI00xTWQZ8ZwI_E/edit#heading=h.nch5d72p27ex
      Show
      Before this issue, read path was 100% offheap when block is in the BucketCache. But if a cache miss, then the RS needs to read the block via an on-heap API which causes high young-GC pressure. This issue adds reading the block via offheap even if reading the block from filesystem directly. It requires hadoop version(>=2.9.3) but can also work with older hadoop versions (all works but we continue to read block onheap). It also requires HBASE-21946 which is not yet in place as of this writing/hbase-2.3.0. We have written a careful doc about the implementation, performance and practice here: https://docs.google.com/document/d/1xSy9axGxafoH-Qc17zbD2Bd--rWjjI00xTWQZ8ZwI_E/edit#heading=h.nch5d72p27ex

    Description

      In HFileBlock#readBlockDataInternal, we have the following:

      @VisibleForTesting
      protected HFileBlock readBlockDataInternal(FSDataInputStream is, long offset,
          long onDiskSizeWithHeaderL, boolean pread, boolean verifyChecksum, boolean updateMetrics)
       throws IOException {
       // .....
        // TODO: Make this ByteBuffer-based. Will make it easier to go to HDFS with BBPool (offheap).
        byte [] onDiskBlock = new byte[onDiskSizeWithHeader + hdrSize];
        int nextBlockOnDiskSize = readAtOffset(is, onDiskBlock, preReadHeaderSize,
            onDiskSizeWithHeader - preReadHeaderSize, true, offset + preReadHeaderSize, pread);
        if (headerBuf != null) {
              // ...
        }
        // ...
       }
      

      In the read path, we still read the block from hfile to on-heap byte[], then copy the on-heap byte[] to offheap bucket cache asynchronously, and in my 100% get performance test, I also observed some frequent young gc, The largest memory footprint in the young gen should be the on-heap block byte[].

      In fact, we can read HFile's block to ByteBuffer directly instead of to byte[] for reducing young gc purpose. we did not implement this before, because no ByteBuffer reading interface in the older HDFS client, but 2.7+ has supported this now, so we can fix this now. I think.

      Will provide an patch and some perf-comparison for this.

      Attachments

        1. gc-data-before-HBASE-21879.png
          385 kB
          Zheng Hu
        2. QPS-latencies-before-HBASE-21879.png
          279 kB
          Zheng Hu
        3. HBASE-21879.v1.patch
          76 kB
          Zheng Hu
        4. HBASE-21879.v1.patch
          76 kB
          Zheng Hu

        Issue Links

          There are no Sub-Tasks for this issue.

          Activity

            People

              openinx Zheng Hu
              openinx Zheng Hu
              Votes:
              0 Vote for this issue
              Watchers:
              24 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: