Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-21775

The BufferedMutator doesn't ever refresh region location cache

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.0.0, 2.2.0, 2.1.3, 2.0.5
    • Component/s: Client
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      I noticed in some of my writing jobs that the BufferedMutator would get stuck retrying writes against a dead server.

      19/01/18 15:15:47 INFO [Executor task launch worker for task 0] client.AsyncRequestFutureImpl: #2, waiting for 1  actions to finish on table: dummy_table
      19/01/18 15:15:54 WARN [htable-pool3-t56] client.AsyncRequestFutureImpl: id=2, table=dummy_table, attempt=15/21, failureCount=1ops, last exception=org.apache.hadoop.hbase.DoNotRetryIOException: Operation rpcTimeout on <SERVER>,17020,1547848193782, tracking started Fri Jan 18 14:55:37 PST 2019; NOT retrying, failed=1 -- final attempt!
      19/01/18 15:15:54 ERROR [Executor task launch worker for task 0] IngestRawData.map(): [B@258bc2c7: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: Operation rpcTimeout: 1 time, servers with issues: <SERVER>,17020,1547848193782
      

       

      After the single remaining action permanently failed, it would resume progress only to get stuck again retrying against the same dead server:

      19/01/18 15:21:18 INFO [Executor task launch worker for task 0] client.AsyncRequestFutureImpl: #2, waiting for 1  actions to finish on table: dummy_table
      19/01/18 15:21:18 INFO [Executor task launch worker for task 0] client.AsyncRequestFutureImpl: #2, waiting for 1  actions to finish on table: dummy_table
      19/01/18 15:21:20 INFO [htable-pool3-t55] client.AsyncRequestFutureImpl: id=2, table=dummy_table, attempt=6/21, failureCount=1ops, last exception=java.net.ConnectException: Call to <SERVER> failed on connection exception: org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: connection timed out: <SERVER> on <SERVER>,17020,1547848193782, tracking started null, retrying after=20089ms, operationsToReplay=1
      

       

      Only restarting the client process to generate a new BufferedMutator instance would fix the issue, at least until the next regionserver crash

       The logs I've pasted show the issue happening with a ConnectionTimeoutException, but we've also seen it with NotServingRegionException and some others

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                tommyzli Tommy Li
                Reporter:
                tommyzli Tommy Li
              • Votes:
                0 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: