Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
hbase-filesystem-1.0.0-alpha1
-
None
Description
This came up when running bulkloads on hbase deployments using HBOSS. The fixes introduced by HBASE-23679 use FileSystem.closeAllForUGI(ugi) to make sure FileSystem instances get cleared for the specific running UGI. Problem is that FileSystem.closeAllForUGI does not remove the instance from FileSystem.CACHE explicitly, it rather calls FileSystem.close, which in turn removes itself from FileSystem.CACHE. In this case, though, our FileSystem implementation is HBaseObjectStoreSemantics, so FileSystem.closeAllForUGI closes it, but does not remove it from FileSystem.CACHE, leading to all attempts to FileSystem.get by the same UGI retrieving a closed HBaseObjectStoreSemantics instance, ultimately failing as below:
2020-08-26 12:43:57,528 ERROR org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager: Failed to complete bulk load java.io.IOException: Exception while testing a lock at org.apache.hadoop.hbase.oss.sync.ZKTreeLockManager.isLocked(ZKTreeLockManager.java:312) at org.apache.hadoop.hbase.oss.sync.ZKTreeLockManager.writeLockAbove(ZKTreeLockManager.java:183) at org.apache.hadoop.hbase.oss.sync.TreeLockManager.treeReadLock(TreeLockManager.java:282) at org.apache.hadoop.hbase.oss.sync.TreeLockManager.lock(TreeLockManager.java:449) at org.apache.hadoop.hbase.oss.HBaseObjectStoreSemantics.exists(HBaseObjectStoreSemantics.java:498) at org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager$1.run(SecureBulkLoadManager.java:281) at org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager$1.run(SecureBulkLoadManager.java:266) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:360) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1856) at org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager.secureBulkLoadHFiles(SecureBulkLoadManager.java:266) at org.apache.hadoop.hbase.regionserver.RSRpcServices.bulkLoadHFile(RSRpcServices.java:2445) at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42280) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:418) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318) Caused by: java.lang.IllegalStateException: Expected state [STARTED] was [STOPPED]
Attachments
Issue Links
- links to