[HBASE-20636] Introduce two bloom filter type : ROWPREFIX_FIXED_LENGTH and ROWPREFIX_DELIMITED - ASF JIRA

Log work

Agile Board

Rank to Top

Rank to Bottom

Attach files

Attach Screenshot

Bulk Copy Attachments

Bulk Move Attachments

Voters

Watch issue

Watchers

Create sub-task

Move

Link

Clone

Labels

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 3.0.0-alpha-1, 2.2.0
Component/s: HFile, regionserver, Scanners
Labels:
None

Hadoop Flags:

Reviewed
Release Note:

Hide
Add two bloom filter type : ROWPREFIX_FIXED_LENGTH and ROWPREFIX_DELIMITED
1. ROWPREFIX_FIXED_LENGTH: specify the length of the prefix
2. ROWPREFIX_DELIMITED: specify the delimiter of the prefix
Need to specify parameters for these two types of bloomfilter, otherwise the table will fail to create
Example:
create 't1', {NAME => 'f1', BLOOMFILTER => 'ROWPREFIX_FIXED_LENGTH', CONFIGURATION => {'RowPrefixBloomFilter.prefix_length' => '10'}}
create 't1', {NAME => 'f1', BLOOMFILTER => 'ROWPREFIX_DELIMITED', CONFIGURATION => {'RowPrefixDelimitedBloomFilter.delimiter' => '#'}}

Show
Add two bloom filter type : ROWPREFIX_FIXED_LENGTH and ROWPREFIX_DELIMITED 1. ROWPREFIX_FIXED_LENGTH: specify the length of the prefix 2. ROWPREFIX_DELIMITED: specify the delimiter of the prefix Need to specify parameters for these two types of bloomfilter, otherwise the table will fail to create Example: create 't1', {NAME => 'f1', BLOOMFILTER => 'ROWPREFIX_FIXED_LENGTH', CONFIGURATION => {'RowPrefixBloomFilter.prefix_length' => '10'}} create 't1', {NAME => 'f1', BLOOMFILTER => 'ROWPREFIX_DELIMITED', CONFIGURATION => {'RowPrefixDelimitedBloomFilter.delimiter' => '#'}}

Description

As we all know, HBase uses BloomFilter(ROW and ROWCOL) to filter unnecessary files to improve read performance. But they only support Get and do not support Scan.

In our company(Tencent), many users need to scan all rows with the same prefix, such as Tencent Game. Game user's some operational record will be written into HBase, each game user will have a lot of records, the rowkey is constructed as userid+'#'+timestamps. So we can scan all records for a given user for a specified period.

For this scenario, we designed the prefix Bloom filter. If the startRow and stopRow of the Scan has a valid common prefix, the scan will be allowed to use BloomFilter to filter files which will enhance the performance of the scan.

Now, this feature has been running on our cluster over a year, and scan performance for this scenario has been improved by more than one times than before.