Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-25582

Support setting scan ReadType to be STREAM at cluster level

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.0.0
    • 3.0.0-alpha-1, 2.5.0, 2.3.5, 2.4.2
    • None
    • None
    • Reviewed
    • Hide
      Adding a new meaning for the config 'hbase.storescanner.pread.max.bytes' when configured with a value <0.
      In HBase 2.x we allow the Scan op to specify a ReadType (PREAD / STREAM/ DEFAULT). When Scan comes with DEFAULT read type, we will start scan with preads and later switch to stream read once we see we are scanning a total data size > value of hbase.storescanner.pread.max.bytes. (This is calculated for data per region:cf). This config defaults to 4 x of HFile block size = 256 KB by default.
      This jira added a new meaning for this config when configured with a -ve value. In such case, for all scans with DEFAULT read type, we will start with STREAM read itself. (Switch at begin of the scan itself)
      Show
      Adding a new meaning for the config 'hbase.storescanner.pread.max.bytes' when configured with a value <0. In HBase 2.x we allow the Scan op to specify a ReadType (PREAD / STREAM/ DEFAULT). When Scan comes with DEFAULT read type, we will start scan with preads and later switch to stream read once we see we are scanning a total data size > value of hbase.storescanner.pread.max.bytes. (This is calculated for data per region:cf). This config defaults to 4 x of HFile block size = 256 KB by default. This jira added a new meaning for this config when configured with a -ve value. In such case, for all scans with DEFAULT read type, we will start with STREAM read itself. (Switch at begin of the scan itself)

    Description

      We have the config 'hbase.storescanner.use.pread' at cluster level to set ReadType to be PRead if not explicitly specified in Scan object.
      Same way we can have a way to make scan as STREAM type at cluster level (if not specified at Scan object level)
      We do not need any new configs or so. We have the config 'hbase.storescanner.pread.max.bytes' which specifies when to switch read type to stream and it defaults to 4 * HFile block size. If one config this value as <= 0 means user need the switch when scanner is created itself. With such a handling we can support it.
      So every scan need not set the read type.

      The issue is in Cloud storage based system using Stream reads might be better. We introduced this PRead based scan with tests on HDFS based storage. In my customer case, Azure storage in place and WASB driver been used. We have a read ahead mechanism there (Read an entire Block of a blob in one REST call) and buffer that in WASB driver. This helps a lot wrt longer scans. Ya with config 'hbase.storescanner.pread.max.bytes' we can make the switch to happen early but better to go with 1.x way where the scan starts with Stream read itself.

      Attachments

        Activity

          People

            anoop.hbase Anoop Sam John
            anoop.hbase Anoop Sam John
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: