Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-830

Debugging HCM.locateRegionInMeta is painful

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 0.2.0
    • 0.2.1, 0.18.0
    • Client
    • None

    Description

      I've been debugging a case where a bunch of reduces were hanging for no apparent reason and then get killed because they did not do anything for 600 seconds. I figured that it's because we are stuck in a very long waiting time due to retry backoffs.

      public static int RETRY_BACKOFF[] = { 1, 1, 1, 1, 2, 4, 8, 16, 32, 64 };
      

      That means we wait 10 sec, 10 sec, 10, 10, ... then 640 sec. That's a long time, do we really need that much time to finally be warned that there's a bug in HBase?

      Also, the places where we get this:

      LOG.debug("reloading table servers because: " + t.getMessage());
      

      should be more verbose. I my logs these are caused by a table not found but the only thing I see is "reloading table servers because: tableName".

      Attachments

        1. hbase-826-v1.patch
          2 kB
          Jean-Daniel Cryans
        2. 830-v2-shortertimeouts.patch
          2 kB
          Michael Stack

        Activity

          People

            Unassigned Unassigned
            jdcryans Jean-Daniel Cryans
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: