Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-22833

MultiRowRangeFilter should provide a method for creating a filter which is functionally equivalent to multiple prefix filters

    XMLWordPrintableJSON

    Details

    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      Provide a public method in MultiRowRangeFilter class to speed the requirement of filtering with multiple row prefixes, it will expand the row prefixes as multiple rowkey ranges by MultiRowRangeFilter, it's more efficient.
      {code}
      public MultiRowRangeFilter(byte[][] rowKeyPrefixes);
      {code}
      Show
      Provide a public method in MultiRowRangeFilter class to speed the requirement of filtering with multiple row prefixes, it will expand the row prefixes as multiple rowkey ranges by MultiRowRangeFilter, it's more efficient. {code} public MultiRowRangeFilter(byte[][] rowKeyPrefixes); {code}

      Description

      HI,

      I think current formal way to make multiple prefix filters is to create a FilterList and add PrefixFilter instances to the list:

      FilterList allFilters = new FilterList(FilterList.Operator.MUST_PASS_ONE);
      allFilters.addFilter(new PrefixFilter(Bytes.toBytes("123")));
      allFilters.addFilter(new PrefixFilter(Bytes.toBytes("456")));
      allFilters.addFilter(new PrefixFilter(Bytes.toBytes("678")));
      scan.setFilter(allFilters);
      

      (c.f., https://stackoverflow.com/questions/41074213/hbase-how-to-specify-multiple-prefix-filters-in-a-single-scan-operation )

      However, in the case of creating a single prefix filter, HBase provides scan.setRowPrefixFilter method.
      This method creates a range filter by setting a start row and a stop row.
      The value of a stop row is decided by calling calculateTheClosestNextRowKeyForPrefix ( c.f., https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Scan.java#L574-L597 )

      MultiRowRangeFilter could leverage a list of start row and stop row pairs and calculateTheClosestNextRowKeyForPrefix could compute the stop row value corresponding to given start row (i.e., a prefix).

      I think this kind of filter (a filter which is functionally equivalent to multiple prefix filters) should be creatable by MultiRowRangeFilter and it's better than the current formal way.

      Cheers,

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                titsuki Itsuki Toyota
                Reporter:
                titsuki Itsuki Toyota
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: