Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
2.4.17
-
None
-
None
Description
When executing commands to create a table, I noticed the following ERROR in HMaster
2023-10-17 06:41:47,118 ERROR [master/hmaster:16000.Chore.1] master.TableStateManager: Unable to get table uuidf68fb89ec7f4435597d69fb7b099d8e7 state org.apache.hadoop.hbase.TableNotFoundException: No state found for uuidf68fb89ec7f4435597d69fb7b099d8e7 at org.apache.hadoop.hbase.master.TableStateManager.getTableState(TableStateManager.java:155) at org.apache.hadoop.hbase.master.TableStateManager.isTableState(TableStateManager.java:92) at org.apache.hadoop.hbase.master.assignment.AssignmentManager.isTableDisabled(AssignmentManager.java:419) at org.apache.hadoop.hbase.master.assignment.AssignmentManager.getRegionStatesCount(AssignmentManager.java:2341) at org.apache.hadoop.hbase.master.HMaster.getClusterMetricsWithoutCoprocessor(HMaster.java:2616) at org.apache.hadoop.hbase.master.HMaster.getClusterMetricsWithoutCoprocessor(HMaster.java:2537) at org.apache.hadoop.hbase.master.balancer.ClusterStatusChore.chore(ClusterStatusChore.java:47) at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:158) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:107) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750)
Reproduce
Due to the thread interleaving, it might need to run the following command sequence multiple times to reproduce
1 HM, 2 RS, HDFS 2.10.2 cluster
create 'uuid49bb410e0a0c40ffb070d17787b4cad7', {NAME => 'uuid66e57e5195e04956a78f789b2a25ec01', VERSIONS => 1, COMPRESSION => 'GZ', BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'true'}, {NAME => 'uuid119181eed72a43ccb66fabe37f84d2c0', VERSIONS => 1, COMPRESSION => 'GZ', BLOOMFILTER => 'NONE', IN_MEMORY => 'true'}, {NAME => 'uuidc2d4931eaf4c429db0e55514fb12e767', VERSIONS => 3, COMPRESSION => 'NONE', BLOOMFILTER => 'NONE', IN_MEMORY => 'false'}, {NAME => 'uuidc9802bbfbe434411ae68bb8388d499b6', VERSIONS => 3, COMPRESSION => 'NONE', BLOOMFILTER => 'NONE', IN_MEMORY => 'false'}, {NAME => 'uuidc85e117d0ca144719fc53d30b189a343', VERSIONS => 3, COMPRESSION => 'NONE', BLOOMFILTER => 'NONE', IN_MEMORY => 'false'} create 'uuid094dd5bf47eb47d69148b63e73ce0e7c', {NAME => 'uuid76ccbd96fbdc418b95ed9971ff423b2d', VERSIONS => 1, COMPRESSION => 'GZ', BLOOMFILTER => 'ROW', IN_MEMORY => 'true'}, {NAME => 'uuid36835d3faff04838bd02d6226557d7c8', VERSIONS => 1, COMPRESSION => 'GZ', BLOOMFILTER => 'NONE', IN_MEMORY => 'false'}, {NAME => 'uuid37752598d1bb405eb39a3e17c04d7e60', VERSIONS => 1, COMPRESSION => 'NONE', BLOOMFILTER => 'NONE', IN_MEMORY => 'false'} create 'uuidf68fb89ec7f4435597d69fb7b099d8e7', {NAME => 'uuidb235288b1d304fe1a62adb63968d9eee', VERSIONS => 1, COMPRESSION => 'NONE', BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'false'}, {NAME => 'uuidf348f8849e724b3fa231fc2bb459be2d', VERSIONS => 1, COMPRESSION => 'NONE', BLOOMFILTER => 'NONE', IN_MEMORY => 'true'}, {NAME => 'uuid81341a87083e49d7a0d8aff7b1ccf16a', VERSIONS => 3, COMPRESSION => 'GZ', BLOOMFILTER => 'ROW', IN_MEMORY => 'false'}, {NAME => 'uuid24db0d3c67c347d3a4c18af90facec2d', VERSIONS => 1, COMPRESSION => 'NONE', BLOOMFILTER => 'ROW', IN_MEMORY => 'true'}, {NAME => 'uuid7ecf10315f444cfd9c5698695f9054d9', VERSIONS => 1, COMPRESSION => 'NONE', BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'false'} enable 'uuid094dd5bf47eb47d69148b63e73ce0e7c' create_namespace 'uuidc1066f82d7834f698d335dd04fa7ad3e' alter 'uuid094dd5bf47eb47d69148b63e73ce0e7c', {NAME => 'enaJvIGYBk', BLOOMFILTER => 'ROWCOL', IN_MEMORY => false} disable 'uuidf68fb89ec7f4435597d69fb7b099d8e7'
I have attached the full logs.
Root Cause
The ERROR message is thrown because of the thread interleaving between (1) T1: creating the table and (2) T2: Chore thread calculating TABLE_TO_REGIONS_COUNT.
Here's how it happens in detail
- User issues a create table request, it puts the table name into tableDescriptors.
- Chore thread is trying to calculate TABLE_TO_REGIONS_COUNT by iterating all tables from getTableDescriptors().getAll(). This also includes the table which is being created but the table state is not created yet.
- It tries to fetch the table state and throws an ERROR.
IMO, this is a normal and correct process which shouldn't incur ERROR level message. It could be avoided by properly handling the thread interleaving between table updates and chore threads.
I am trying to fix it. Any help would be appreciated!