Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-26596

region_mover should gracefully ignore null response from RSGroupAdmin#getRSGroupOfServer

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • 1.7.1
    • None
    • mover, rsgroup
    • None

    Description

      If regionserver has any non-daemon thread running even after it's own shutdown, the running non-daemon thread can prevent clean JVM exit and regionserver could be stuck in the zombie state. We have recently provided a workaround for this in HBASE-26468 for regionserver exit hook to wait 30s for all non-daemon threads to get stopped before terminating JVM abnormally.

      However, if regionserver is stuck in such state, region_mover unload fails with:

      NoMethodError: undefined method `getName` for nil:NilClass
        getSameRSGroupServers at /bin/region_mover.rb:503
                   __ensure__ at /bin/region_mover.rb:313 
                unloadRegions at /bin/region_mover.rb:310               
                       (root) at /bin/region_mover.rb:572               
       

      This happens if the cluster has RSGroup enabled and the given server is already stopped, hence RSGroupAdmin#getRSGroupOfServer would return null (as the server is not running anymore so it is not part of any RSGroup). region_mover should ride over this null response and gracefully exit from unloadRegions() call.

       

      We should also check if the fix is applicable to branch-2 and above.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              vjasani Viraj Jasani
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: