Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-19486

Periodically ensure records are not buffered too long by BufferedMutator

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 2.0.0-beta-1, 2.0.0
    • Client
    • None
    • Reviewed
    • Hide
      The BufferedMutator now supports two settings that are used to ensure records do not stay too long in the buffer of a BufferedMutator. For periodically flushing the BufferedMutator there is now a "Timeout": "How old may the oldest record in the buffer be before we force a flush" and a "TimerTick": How often do we check if the timeout has been exceeded. Using these settings you can make the BufferedMutator automatically flush the write buffer if after the specified number of milliseconds no flush has occurred.

      This is mainly useful in streaming scenarios (i.e. writing data into HBase using Apache Flink/Beam/Storm) where it is common (especially in a test/development situation) to see small unpredictable bursts of data that need to be written into HBase. When using the BufferedMutator till now the effect was that records would remain in the write buffer until the buffer was full or an explicit flush was triggered. In practice this would mean that the 'last few records' of a burst would remain in the write buffer until the next burst arrives filling the buffer to capacity and thus triggering a flush.
      Show
      The BufferedMutator now supports two settings that are used to ensure records do not stay too long in the buffer of a BufferedMutator. For periodically flushing the BufferedMutator there is now a "Timeout": "How old may the oldest record in the buffer be before we force a flush" and a "TimerTick": How often do we check if the timeout has been exceeded. Using these settings you can make the BufferedMutator automatically flush the write buffer if after the specified number of milliseconds no flush has occurred. This is mainly useful in streaming scenarios (i.e. writing data into HBase using Apache Flink/Beam/Storm) where it is common (especially in a test/development situation) to see small unpredictable bursts of data that need to be written into HBase. When using the BufferedMutator till now the effect was that records would remain in the write buffer until the buffer was full or an explicit flush was triggered. In practice this would mean that the 'last few records' of a burst would remain in the write buffer until the next burst arrives filling the buffer to capacity and thus triggering a flush.

    Description

      I'm working on several projects where we are doing stream / event type processing instead of batch type processing. We mostly use Apache Flink and Apache Beam for these projects.

      When we ingest a continuous stream of events and feed that into HBase via a BufferedMutator this all works fine. The buffer fills up at a predictable rate and we can make sure it flushes several times per second into HBase by tuning the buffer size.

      We also have situations where the event rate is unpredictable. Some times because the source is in reality a batch job that puts records into Kafka, sometimes because it is the "predictable in production" application in our testing environment (where only the dev triggers a handful of events).

      For these kinds of use cases we need a way to 'force' the BufferedMutator to automatically flush any records in the buffer even if the buffer is not full.

      I'll put up a pull request with a proposed implementation for review against the master (i.e. 3.0.0).
      When approved I would like to backport this to the 1.x and 2.x versions of the client in the same (as close as possible) way.

      Attachments

        1. HBASE-19486.20171231-105839-addendum.patch
          6 kB
          Niels Basjes
        2. HBASE-19486.20180102-081903-addendum.patch
          6 kB
          Niels Basjes
        3. HBASE-19486.v0.patch
          31 kB
          Chia-Ping Tsai
        4. HBASE-19486-20171212-2117.patch
          17 kB
          Niels Basjes
        5. HBASE-19486-20171218-1229.patch
          20 kB
          Niels Basjes
        6. HBASE-19486-20171218-1300.patch
          20 kB
          Niels Basjes
        7. HBASE-19486-20171219-0933.patch
          20 kB
          Niels Basjes
        8. HBASE-19486-20171219-1026.patch
          21 kB
          Niels Basjes
        9. HBASE-19486-20171219-1122-trigger-qa-run.patch
          22 kB
          Niels Basjes
        10. HBASE-19486-20171220-1612-trigger-qa-run.patch
          22 kB
          Niels Basjes
        11. HBASE-19486-20171220-2228-trigger-qa-run.patch
          22 kB
          Niels Basjes
        12. HBASE-19486-20171223-1438-trigger-qa-run.patch
          30 kB
          Niels Basjes
        13. HBASE-19486-20171223-1728-trigger-qa-run.patch
          29 kB
          Niels Basjes
        14. HBASE-19486-20171223-2222-trigger-qa-run.patch
          31 kB
          Niels Basjes
        15. HBASE-19486-20171224-1101-trigger-qa-run.patch
          31 kB
          Niels Basjes
        16. HBASE-19486-20171224-1602.patch
          30 kB
          Niels Basjes
        17. HBASE-19486-branch-1.v0.patch
          32 kB
          Niels Basjes
        18. HBASE-19486-branch-1.v1.patch
          32 kB
          Niels Basjes

        Issue Links

          Activity

            People

              nielsbasjes Niels Basjes
              nielsbasjes Niels Basjes
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: