Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-27241

Add metrics for evaluating cost and effectiveness of bloom filters

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.5.0, 3.0.0-alpha-4
    • None
    • Hide
      Adds 3 new metrics, which are available per-table and per-regionserver:
      - bloomFilterRequestsCount -- how many checks against a bloom filter occurred. Note that a single read might increment this multiple times, one for each storefile it needs to check.
      - bloomFilterNegativeResultsCount -- how many of those requests came back negative, indicating the bloom filter helped to avoid opening a storefile which did not include the row. If this value is low, it might be worth disabling bloom filters on a table
      - bloomFilterEligibleRequestsCount -- increments in the same way as bloomFilterRequestsCount, but only when bloom filters are _not_ enabled for a table. If this value is high, it might be worth enabling bloom filters on the table.

      Additionally makes 2 existing metrics available on a per-table basis:
      - staticBloomSize -- uncompressed size of the bloom filters for a table
      - staticIndexSize -- uncompressed size of the storefile indexes for a table

      You can combine bloom filter request and result counts with staticBloomSize to determine if the cost (size) is worth the effectiveness (result/request count) of the bloom.
      Show
      Adds 3 new metrics, which are available per-table and per-regionserver: - bloomFilterRequestsCount -- how many checks against a bloom filter occurred. Note that a single read might increment this multiple times, one for each storefile it needs to check. - bloomFilterNegativeResultsCount -- how many of those requests came back negative, indicating the bloom filter helped to avoid opening a storefile which did not include the row. If this value is low, it might be worth disabling bloom filters on a table - bloomFilterEligibleRequestsCount -- increments in the same way as bloomFilterRequestsCount, but only when bloom filters are _not_ enabled for a table. If this value is high, it might be worth enabling bloom filters on the table. Additionally makes 2 existing metrics available on a per-table basis: - staticBloomSize -- uncompressed size of the bloom filters for a table - staticIndexSize -- uncompressed size of the storefile indexes for a table You can combine bloom filter request and result counts with staticBloomSize to determine if the cost (size) is worth the effectiveness (result/request count) of the bloom.

    Description

      Bloom filters can be costly for some tables, easily resulting in an aggregate memory footprint of many GBs. It's currently hard to monitor for that cost on a per-table basis. You can view staticBloomSize in JMX, but that is for the whole server. Otherwise you must manually sum the values using the regionserver UI.  We can add this (as well as staticIndexSize) to the per-table metrics.

      Additionally, it can be hard to know how effective those bloom filters are. I think the easiest way to measure that is to count bloomFilterRequests and bloomFilterNegativeResults. With these metrics in hand, one can have an easier time deciding how much memory they want to give to their L1 cache and/or whether they want to disable blooms on a table.

      Attachments

        Issue Links

          Activity

            People

              bbeaudreault Bryan Beaudreault
              bbeaudreault Bryan Beaudreault
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: