Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-27349

HBase FileNotFound Exception After Region Transitioned

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • 2.1.0
    • None
    • None
    • None

    Description

      We have the exactly the same issue with https://issues.apache.org/jira/browse/HBASE-13651:

      • The SCAN will got FNFE after RS got Full GC and  transmitted and opened in another RS. 
      • During which ,taking snapshot will also report FNFE
      • Issue could be resolved by move the problem region manually.

      We find that the HBASE-13651 is reverted afterwards by https://issues.apache.org/jira/browse/HBASE-18786 since they thought it is not a problem anymore with the comment in HBASE-18786

      Basic Timeline of my issue:

       2022-08-27 05:26:35     Snapshot TestSnapshot is taken successfully
       2022-08-27 15:21:51     The target hfile fafb8f91bd20b1adfe15e2a64a39557e/i/041e9aeb8cdb46f991459c92f8581e16 is generated by a compaction in regionserver-67
      2022-08-27 17:26:36     041e9aeb8cdb46f991459c92f8581e16 is compacted to fd53b8e6b4874eb38712ad2d04389fff successfully
      2022-08-27 17:34:53     A full GC started to happen on regionserver-67
       2022-08-27 17:35:50    Region fafb8f91bd20b1adfe15e2a64a39557e is re-opened in regionserver-11, which is scheduled by HMaster
      2022-08-27 17:35:56     regionserver-67 wake up from Full GC
      2022-08-27 17:35:57     File  fafb8f91bd20b1adfe15e2a64a39557e is archived by lashadoop-regionserver-67 and afterwards, regionserver-67 found that it is kicked out and exit.
       2022-08-27 18:00:00    The archived hfile is removed by HMaster's CleanerChore 
       2022-08-27 19:48:10    User's job shows error that the file is missed
       2022-08-27 20:26:04    Re-taking snapshot TaggingSegmentationSnapshot failed for 041e9aeb8cdb46f991459c92f8581e16 is missing

      The exception of Scanning after region is transmitted:

       

      java.io.FileNotFoundException: File does not exist:/hbase/prod/hbase-prod/data/default/mdm/fafb8f91bd20b1adfe15e2a64a39557e/i/041e9aeb8cdb46f991459c92f8581e16        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:85)
              at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:75)
              at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:152)
              at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1909)
              at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:735)
              at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:415)
              at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
              at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
              at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
              at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
              at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
              at java.security.AccessController.doPrivileged(Native Method)
              at javax.security.auth.Subject.doAs(Subject.java:422)
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
              at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
              at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
              at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
              at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
              at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
              at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
              at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:88)
              at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:861)
              at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:848)
              at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:837)
              at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1005)
              at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:317)
              at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:313)
              at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
              at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:325)
              at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:163)
              at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:898)
              at org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.<init>(FSDataInputStreamWrapper.java:125)
              at org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.<init>(FSDataInputStreamWrapper.java:102)
              at org.apache.hadoop.hbase.regionserver.StoreFileInfo.open(StoreFileInfo.java:269)
              at org.apache.hadoop.hbase.regionserver.HStoreFile.createStreamReader(HStoreFile.java:491)
              at org.apache.hadoop.hbase.regionserver.HStoreFile.getStreamScanner(HStoreFile.java:516)
              at org.apache.hadoop.hbase.regionserver.StoreFileScanner.getScannersForStoreFiles(StoreFileScanner.java:149)
              at org.apache.hadoop.hbase.regionserver.HStore.getScanners(HStore.java:1309)
              at org.apache.hadoop.hbase.regionserver.HStore.recreateScanners(HStore.java:2042)
              at org.apache.hadoop.hbase.regionserver.StoreScanner.trySwitchToStreamRead(StoreScanner.java:1064)
              at org.apache.hadoop.hbase.regionserver.StoreScanner.shipped(StoreScanner.java:1198)
              at org.apache.hadoop.hbase.regionserver.KeyValueHeap.shipped(KeyValueHeap.java:437)
              at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.shipped(HRegion.java:6959)
              at org.apache.hadoop.hbase.regionserver.RSRpcServices$RegionScannerShippedCallBack.run(RSRpcServices.java:388)
              at org.apache.hadoop.hbase.ipc.ServerCall.setResponse(ServerCall.java:289)
              at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:161)
              at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
              at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
      

       

      The exception of taking snapshot after region is transmitted:

      2022-08-27 20:26:03,794 ERROR org.apache.hadoop.hbase.procedure.Subprocedure: Subprocedure 'TaggingSegmentationSnapshot' aborting due to a ForeignException!
      java.io.FileNotFoundException via regionserver-11.**,60020,1653373878295:java.io.FileNotFoundException: File does not exist: hdfs://test-hbase/hbase/prod/hbase-prod/data/default/mdm/fafb8f91bd20b1adfe15e2a64a39557e/i/041e9aeb8cdb46f991459c92f8581e16
              at org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:349)
              at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:173)
              at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.insideBarrier(FlushSnapshotSubprocedure.java:193)
              at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:189)
              at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:53)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at java.lang.Thread.run(Thread.java:748)
      Caused by: java.io.FileNotFoundException: File does not exist: hdfs://beaconstore/hbase/prod/hbase-prod/data/ap/mdm_user_segments/fafb8f91bd20b1adfe15e2a64a39557e/i/041e9aeb8cdb46f991459c92f8581e16
              at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1500)
              at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1493)
              at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
              at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1508)
              at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442)
              at org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:368)
              at org.apache.hadoop.hbase.snapshot.SnapshotManifestV2$ManifestBuilder.storeFile(SnapshotManifestV2.java:129)
              at org.apache.hadoop.hbase.snapshot.SnapshotManifestV2$ManifestBuilder.storeFile(SnapshotManifestV2.java:68)
              at org.apache.hadoop.hbase.snapshot.SnapshotManifest.addRegion(SnapshotManifest.java:249)
              at org.apache.hadoop.hbase.snapshot.SnapshotManifest.addRegion(SnapshotManifest.java:218)
              at org.apache.hadoop.hbase.regionserver.HRegion.addRegionToSnapshot(HRegion.java:4285)
              at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:134)
              at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:77)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
              ... 4 more
      

       

      cc mbertozzi  apurtell  

      Attachments

        Issue Links

          Activity

            People

              zhangduo Duo Zhang
              wuchang1989 wuchang
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: