[HBASE-26863] Rowkey pushdown does not work with complex conditions - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: connector-1.0.0
Fix Version/s: hbase-connectors-1.0.1
Component/s: hbase-connectors
Labels:
None

Hadoop Flags:

Reviewed

Description

When using pushdown column filter feature of hbase-spark-connector, issuing complex query containing rowkey conditions does not get expected rowkey pushdown.

{
  "table":{"namespace":"default", "name":"t1"},
  "rowkey":"key",
  "columns":{
    "KEY_FIELD":{"cf":"rowkey", "col":"key", "type":"string"},
    "A_FIELD":{"cf":"c", "col":"a", "type":"string"},
    "B_FIELD":{"cf":"c", "col":"b", "type":"string"}
  }
}

For example, given the catalog, a query `spark.sql("SELECT * FROM table WHERE KEY_FIELD >= 'get1' AND KEY_FIELD <= 'get3' AND A_FIELD IS NOT NULL")` gets incomplete rowkey pushdown (ScanRange:(upperBound:get3,isUpperBoundEqualTo:true,lowerBound:,isLowerBoundEqualTo:true)).

If a query is `spark.sql("SELECT * FROM table WHERE KEY_FIELD >= 'get1' AND KEY_FIELD <= 'get3'")`, we get normal rowkey pushdown (ScanRange:(upperBound:get3,isUpperBoundEqualTo:true,lowerBound:,isLowerBoundEqualTo:true)).

I found that ScanRange#getOverlapScanRange and ScanRange#mergeIntersect return incorrect results if the range from the arguments is wider than the instance (or scanRange.getOverlapScanRange(scanRange) where scanRange1⊂scanRange2). Depending on the order of the Filters that the Spark optimization results produce, the order of the scan ranges that these methods receive could be the one that causes such a problem.

I will create a PR later.

Attachments

Issue Links

links to

GitHub Pull Request #95

Activity

People

Assignee:: Unassigned

Reporter:: Yohei Kishimoto

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 18/Mar/22 16:19

Updated:: 04/Aug/23 03:17

Resolved:: 04/Aug/23 03:17