Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-26863

Rowkey pushdown does not work with complex conditions

    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      When using pushdown column filter feature of hbase-spark-connector, issuing complex query containing rowkey conditions does not get expected rowkey pushdown.

      {
        "table":{"namespace":"default", "name":"t1"},
        "rowkey":"key",
        "columns":{
          "KEY_FIELD":{"cf":"rowkey", "col":"key", "type":"string"},
          "A_FIELD":{"cf":"c", "col":"a", "type":"string"},
          "B_FIELD":{"cf":"c", "col":"b", "type":"string"}
        }
      }
      

      For example, given the catalog, a query `spark.sql("SELECT * FROM table WHERE KEY_FIELD >= 'get1' AND KEY_FIELD <= 'get3' AND A_FIELD IS NOT NULL")` gets incomplete rowkey pushdown (ScanRange:(upperBound:get3,isUpperBoundEqualTo:true,lowerBound:,isLowerBoundEqualTo:true)).

      If a query is `spark.sql("SELECT * FROM table WHERE KEY_FIELD >= 'get1' AND KEY_FIELD <= 'get3'")`, we get normal rowkey pushdown (ScanRange:(upperBound:get3,isUpperBoundEqualTo:true,lowerBound:,isLowerBoundEqualTo:true)).

      I found that ScanRange#getOverlapScanRange and ScanRange#mergeIntersect return incorrect results if the range from the arguments is wider than the instance (or  scanRange.getOverlapScanRange(scanRange) where scanRange1⊂scanRange2). Depending on the order of the Filters that the Spark optimization results produce, the order of the scan ranges that these methods receive could be the one that causes such a problem.

      I will create a PR later.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              morokosi Yohei Kishimoto
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: