Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-21788

OpenRegionProcedure (after recovery?) is unreliable and needs to be improved

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Incomplete
    • 3.0.0-alpha-1
    • None
    • None
    • None

    Description

      Not much for this one yet.
      I repeatedly see the cases when the region is stuck in OPENING, and after master restart RIT is recovered, and stays WAITING; its OpenRegionProcedure (also recovered) is stuck in Runnable and never does anything for hours. I cannot find logs on the target server indicating that it ever tried to do anything after master restart.

      This procedure needs at the very least logging of what it's trying to do, and maybe a timeout so it unconditionally fails after a configurable period (1 hour?).
      I may also investigate why it doesn't do anything and file a separate bug. I wonder if it's somehow related to the region status check, but this is just a hunch.

      Attachments

        1. WAL-Orphan.log
          9 kB
          Bahram Chehrazy

        Issue Links

          Activity

            People

              stack Michael Stack
              sershe Sergey Shelukhin
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: