Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-3362

If .META. offline between OPENING and OPENED, then wrong server location in .META. is possible

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • None
    • 0.90.0
    • None
    • None
    • Reviewed

    Description

      This is a good one. It happened to me testing OOME in split logging.

      • Balancer moves region to new location, regionservrer X.
      • New location regionserver X successfully opens the region and then goes to update .META.
      • At this point, the server carrying .META. crashes.
      • Regionserver X is stuck waiting on .META. to come back online. It takes so long master times out the region-in-transition
      • Master assigns the region elsewhere to regionserver Y
      • It opens successfully on regionserver Y and then it also parks waiting on .META. coming online
      • .META. comes online
      • The two servers X and Y race to update .META.

      I saw case where server X edit went in after server Ys edit which means that lookups in .META. get the wrong server. HBCK can detect this situation.

      RegionServer X when it wakes up coreeclty notices that its lost control of the region but the damage is done – where damage is .META. edit.

      Chatting with Jon, he suggested that regionserver X should 'rollback' the .META. edit – do explicit delete of what it added. This would work I think but chatting more, I'll make a fix that keeps updating the zookeeper OPENING state while edit goes on in a separate thread. Our continuous setting of OPENING will make it so region-in-transition does not timeout.

      Attachments

        Activity

          People

            stack Michael Stack
            stack Michael Stack
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: