Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 2.4.17
Fix Version/s: None
Component/s: master
Labels:
None

Description

When executing commands to create a table, I noticed the following ERROR in HMaster

2023-10-17 06:41:47,118 ERROR [master/hmaster:16000.Chore.1] master.TableStateManager: Unable to get table uuidf68fb89ec7f4435597d69fb7b099d8e7 state
org.apache.hadoop.hbase.TableNotFoundException: No state found for uuidf68fb89ec7f4435597d69fb7b099d8e7
        at org.apache.hadoop.hbase.master.TableStateManager.getTableState(TableStateManager.java:155)
        at org.apache.hadoop.hbase.master.TableStateManager.isTableState(TableStateManager.java:92)
        at org.apache.hadoop.hbase.master.assignment.AssignmentManager.isTableDisabled(AssignmentManager.java:419)
        at org.apache.hadoop.hbase.master.assignment.AssignmentManager.getRegionStatesCount(AssignmentManager.java:2341)
        at org.apache.hadoop.hbase.master.HMaster.getClusterMetricsWithoutCoprocessor(HMaster.java:2616)
        at org.apache.hadoop.hbase.master.HMaster.getClusterMetricsWithoutCoprocessor(HMaster.java:2537)
        at org.apache.hadoop.hbase.master.balancer.ClusterStatusChore.chore(ClusterStatusChore.java:47)
        at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:158)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:107)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)

Reproduce

Due to the thread interleaving, it might need to run the following command sequence multiple times to reproduce

1 HM, 2 RS, HDFS 2.10.2 cluster

create 'uuid49bb410e0a0c40ffb070d17787b4cad7', {NAME => 'uuid66e57e5195e04956a78f789b2a25ec01', VERSIONS => 1, COMPRESSION => 'GZ', BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'true'}, {NAME => 'uuid119181eed72a43ccb66fabe37f84d2c0', VERSIONS => 1, COMPRESSION => 'GZ', BLOOMFILTER => 'NONE', IN_MEMORY => 'true'}, {NAME => 'uuidc2d4931eaf4c429db0e55514fb12e767', VERSIONS => 3, COMPRESSION => 'NONE', BLOOMFILTER => 'NONE', IN_MEMORY => 'false'}, {NAME => 'uuidc9802bbfbe434411ae68bb8388d499b6', VERSIONS => 3, COMPRESSION => 'NONE', BLOOMFILTER => 'NONE', IN_MEMORY => 'false'}, {NAME => 'uuidc85e117d0ca144719fc53d30b189a343', VERSIONS => 3, COMPRESSION => 'NONE', BLOOMFILTER => 'NONE', IN_MEMORY => 'false'}
create 'uuid094dd5bf47eb47d69148b63e73ce0e7c', {NAME => 'uuid76ccbd96fbdc418b95ed9971ff423b2d', VERSIONS => 1, COMPRESSION => 'GZ', BLOOMFILTER => 'ROW', IN_MEMORY => 'true'}, {NAME => 'uuid36835d3faff04838bd02d6226557d7c8', VERSIONS => 1, COMPRESSION => 'GZ', BLOOMFILTER => 'NONE', IN_MEMORY => 'false'}, {NAME => 'uuid37752598d1bb405eb39a3e17c04d7e60', VERSIONS => 1, COMPRESSION => 'NONE', BLOOMFILTER => 'NONE', IN_MEMORY => 'false'}
create 'uuidf68fb89ec7f4435597d69fb7b099d8e7', {NAME => 'uuidb235288b1d304fe1a62adb63968d9eee', VERSIONS => 1, COMPRESSION => 'NONE', BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'false'}, {NAME => 'uuidf348f8849e724b3fa231fc2bb459be2d', VERSIONS => 1, COMPRESSION => 'NONE', BLOOMFILTER => 'NONE', IN_MEMORY => 'true'}, {NAME => 'uuid81341a87083e49d7a0d8aff7b1ccf16a', VERSIONS => 3, COMPRESSION => 'GZ', BLOOMFILTER => 'ROW', IN_MEMORY => 'false'}, {NAME => 'uuid24db0d3c67c347d3a4c18af90facec2d', VERSIONS => 1, COMPRESSION => 'NONE', BLOOMFILTER => 'ROW', IN_MEMORY => 'true'}, {NAME => 'uuid7ecf10315f444cfd9c5698695f9054d9', VERSIONS => 1, COMPRESSION => 'NONE', BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'false'}
enable 'uuid094dd5bf47eb47d69148b63e73ce0e7c'
create_namespace 'uuidc1066f82d7834f698d335dd04fa7ad3e'
alter 'uuid094dd5bf47eb47d69148b63e73ce0e7c', {NAME => 'enaJvIGYBk', BLOOMFILTER => 'ROWCOL', IN_MEMORY => false}
disable 'uuidf68fb89ec7f4435597d69fb7b099d8e7'

I have attached the full logs.

Root Cause

The ERROR message is thrown because of the thread interleaving between (1) T1: creating the table and (2) T2: Chore thread calculating TABLE_TO_REGIONS_COUNT.

Here's how it happens in detail

User issues a create table request, it puts the table name into tableDescriptors.
Chore thread is trying to calculate TABLE_TO_REGIONS_COUNT by iterating all tables from getTableDescriptors().getAll(). This also includes the table which is being created but the table state is not created yet.
It tries to fetch the table state and throws an ERROR.

IMO, this is a normal and correct process which shouldn't incur ERROR level message. It could be avoided by properly handling the thread interleaving between table updates and chore threads.

I am trying to fix it. Any help would be appreciated!

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

persistent.tar.gz
17/Oct/23 15:57
103 kB
Ke Han
hbase--master-37bbb9b6f05a.log
17/Oct/23 15:57
231 kB
Ke Han

Unable to get table state error when table is being initialized

Details

Description

Reproduce

Root Cause

Attachments

Attachments

Activity

People

Dates