Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-28159

Unable to get table state error when table is being initialized

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.4.17
    • None
    • master
    • None

    Description

      When executing commands to create a table, I noticed the following ERROR in HMaster

      2023-10-17 06:41:47,118 ERROR [master/hmaster:16000.Chore.1] master.TableStateManager: Unable to get table uuidf68fb89ec7f4435597d69fb7b099d8e7 state
      org.apache.hadoop.hbase.TableNotFoundException: No state found for uuidf68fb89ec7f4435597d69fb7b099d8e7
              at org.apache.hadoop.hbase.master.TableStateManager.getTableState(TableStateManager.java:155)
              at org.apache.hadoop.hbase.master.TableStateManager.isTableState(TableStateManager.java:92)
              at org.apache.hadoop.hbase.master.assignment.AssignmentManager.isTableDisabled(AssignmentManager.java:419)
              at org.apache.hadoop.hbase.master.assignment.AssignmentManager.getRegionStatesCount(AssignmentManager.java:2341)
              at org.apache.hadoop.hbase.master.HMaster.getClusterMetricsWithoutCoprocessor(HMaster.java:2616)
              at org.apache.hadoop.hbase.master.HMaster.getClusterMetricsWithoutCoprocessor(HMaster.java:2537)
              at org.apache.hadoop.hbase.master.balancer.ClusterStatusChore.chore(ClusterStatusChore.java:47)
              at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:158)
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
              at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
              at org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:107)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at java.lang.Thread.run(Thread.java:750)

      Reproduce

      Due to the thread interleaving, it might need to run the following command sequence multiple times to reproduce

      1 HM, 2 RS, HDFS 2.10.2 cluster

      create 'uuid49bb410e0a0c40ffb070d17787b4cad7', {NAME => 'uuid66e57e5195e04956a78f789b2a25ec01', VERSIONS => 1, COMPRESSION => 'GZ', BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'true'}, {NAME => 'uuid119181eed72a43ccb66fabe37f84d2c0', VERSIONS => 1, COMPRESSION => 'GZ', BLOOMFILTER => 'NONE', IN_MEMORY => 'true'}, {NAME => 'uuidc2d4931eaf4c429db0e55514fb12e767', VERSIONS => 3, COMPRESSION => 'NONE', BLOOMFILTER => 'NONE', IN_MEMORY => 'false'}, {NAME => 'uuidc9802bbfbe434411ae68bb8388d499b6', VERSIONS => 3, COMPRESSION => 'NONE', BLOOMFILTER => 'NONE', IN_MEMORY => 'false'}, {NAME => 'uuidc85e117d0ca144719fc53d30b189a343', VERSIONS => 3, COMPRESSION => 'NONE', BLOOMFILTER => 'NONE', IN_MEMORY => 'false'}
      create 'uuid094dd5bf47eb47d69148b63e73ce0e7c', {NAME => 'uuid76ccbd96fbdc418b95ed9971ff423b2d', VERSIONS => 1, COMPRESSION => 'GZ', BLOOMFILTER => 'ROW', IN_MEMORY => 'true'}, {NAME => 'uuid36835d3faff04838bd02d6226557d7c8', VERSIONS => 1, COMPRESSION => 'GZ', BLOOMFILTER => 'NONE', IN_MEMORY => 'false'}, {NAME => 'uuid37752598d1bb405eb39a3e17c04d7e60', VERSIONS => 1, COMPRESSION => 'NONE', BLOOMFILTER => 'NONE', IN_MEMORY => 'false'}
      create 'uuidf68fb89ec7f4435597d69fb7b099d8e7', {NAME => 'uuidb235288b1d304fe1a62adb63968d9eee', VERSIONS => 1, COMPRESSION => 'NONE', BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'false'}, {NAME => 'uuidf348f8849e724b3fa231fc2bb459be2d', VERSIONS => 1, COMPRESSION => 'NONE', BLOOMFILTER => 'NONE', IN_MEMORY => 'true'}, {NAME => 'uuid81341a87083e49d7a0d8aff7b1ccf16a', VERSIONS => 3, COMPRESSION => 'GZ', BLOOMFILTER => 'ROW', IN_MEMORY => 'false'}, {NAME => 'uuid24db0d3c67c347d3a4c18af90facec2d', VERSIONS => 1, COMPRESSION => 'NONE', BLOOMFILTER => 'ROW', IN_MEMORY => 'true'}, {NAME => 'uuid7ecf10315f444cfd9c5698695f9054d9', VERSIONS => 1, COMPRESSION => 'NONE', BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'false'}
      enable 'uuid094dd5bf47eb47d69148b63e73ce0e7c'
      create_namespace 'uuidc1066f82d7834f698d335dd04fa7ad3e'
      alter 'uuid094dd5bf47eb47d69148b63e73ce0e7c', {NAME => 'enaJvIGYBk', BLOOMFILTER => 'ROWCOL', IN_MEMORY => false}
      disable 'uuidf68fb89ec7f4435597d69fb7b099d8e7' 

      I have attached the full logs.

      Root Cause

      The ERROR message is thrown because of the thread interleaving between (1) T1: creating the table and (2) T2: Chore thread calculating TABLE_TO_REGIONS_COUNT.

      Here's how it happens in detail

      1. User issues a create table request, it puts the table name into tableDescriptors.
      2. Chore thread is trying to calculate TABLE_TO_REGIONS_COUNT by iterating all tables from getTableDescriptors().getAll(). This also includes the table which is being created but the table state is not created yet.
      3. It tries to fetch the table state and throws an ERROR.

      IMO, this is a normal and correct process which shouldn't incur ERROR level message. It could be avoided by properly handling the thread interleaving between table updates and chore threads.

      I am trying to fix it. Any help would be appreciated! 

      Attachments

        1. persistent.tar.gz
          103 kB
          Ke Han
        2. hbase--master-37bbb9b6f05a.log
          231 kB
          Ke Han

        Activity

          People

            Unassigned Unassigned
            kehan5800 Ke Han
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: