Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-21077

MR job launched by hbase incremental backup command failed with FileNotFoundException

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.0.0-alpha-1
    • None
    • None
    • Reviewed

    Description

      Discovered during internal testing by Romil Choksi.

      MR job launched by hbase incremental backup command failed with FileNotFoundException

      from test console log

      2018-06-12 04:27:31,160|INFO|MainThread|machine.py:167 - run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|2018-06-12 04:27:31,160 INFO  [main] mapreduce.JobSubmitter: Submitting tokens for job: job_1528766389356_0044
      2018-06-12 04:27:31,186|INFO|MainThread|machine.py:167 - run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|2018-06-12 04:27:31,184 INFO  [main] mapreduce.JobSubmitter: Executing with tokens: [Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:ns1, Ident: (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase@EXAMPLE.COM, renewer=yarn, realUser=, issueDate=1528777648605, maxDate=1529382448605, sequenceNumber=175, masterKeyId=2), Kind: kms-dt, Service: 172.27.68.203:9393, Ident: (kms-dt owner=hbase, renewer=yarn, realUser=, issueDate=1528777649149, maxDate=1529382449149, sequenceNumber=49, masterKeyId=2), Kind: HBASE_AUTH_TOKEN, Service: bc71e347-78ff-4f95-af44-006f9b549a84, Ident: (org.apache.hadoop.hbase.security.token.AuthenticationTokenIdentifier@5), Kind: kms-dt, Service: 172.27.52.14:9393, Ident: (kms-dt owner=hbase, renewer=yarn, realUser=, issueDate=1528777648918, maxDate=1529382448918, sequenceNumber=50, masterKeyId=2)]
      2018-06-12 04:27:31,477|INFO|MainThread|machine.py:167 - run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|2018-06-12 04:27:31,477 INFO  [main] conf.Configuration: found resource resource-types.xml at file:/etc/hadoop/3.0.0.0-1476/0/resource-types.xml
      2018-06-12 04:27:31,527|INFO|MainThread|machine.py:167 - run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|2018-06-12 04:27:31,527 INFO  [main] impl.TimelineClientImpl: Timeline service address: ctr-e138-1518143905142-359429-01-000004.hwx.site:8190
      2018-06-12 04:27:32,563|INFO|MainThread|machine.py:167 - run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|2018-06-12 04:27:32,562 INFO  [main] impl.YarnClientImpl: Submitted application application_1528766389356_0044
      2018-06-12 04:27:32,634|INFO|MainThread|machine.py:167 - run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|2018-06-12 04:27:32,634 INFO  [main] mapreduce.Job: The url to track the job: https://ctr-e138-1518143905142-359429-01-000003.hwx.site:8090/proxy/application_1528766389356_0044/
      2018-06-12 04:27:32,635|INFO|MainThread|machine.py:167 - run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|2018-06-12 04:27:32,635 INFO  [main] mapreduce.Job: Running job: job_1528766389356_0044
      2018-06-12 04:27:44,807|INFO|MainThread|machine.py:167 - run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|2018-06-12 04:27:44,806 INFO  [main] mapreduce.Job: Job job_1528766389356_0044 running in uber mode : false
      2018-06-12 04:27:44,809|INFO|MainThread|machine.py:167 - run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|2018-06-12 04:27:44,809 INFO  [main] mapreduce.Job:  map 0% reduce 0%
      2018-06-12 04:27:54,926|INFO|MainThread|machine.py:167 - run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|2018-06-12 04:27:54,925 INFO  [main] mapreduce.Job:  map 5% reduce 0%
      2018-06-12 04:27:56,950|INFO|MainThread|machine.py:167 - run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|2018-06-12 04:27:56,950 INFO  [main] mapreduce.Job: Task Id : attempt_1528766389356_0044_m_000002_0, Status : FAILED
      2018-06-12 04:27:56,979|INFO|MainThread|machine.py:167 - run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|Error: java.io.FileNotFoundException: File does not exist: hdfs://ns1/apps/hbase/data/oldWALs/ctr-e138-1518143905142-359429-01-000004.hwx.site%2C16020%2C1528776085205.1528776160915
      2018-06-12 04:27:56,979|INFO|MainThread|machine.py:167 - run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1583)
      2018-06-12 04:27:56,980|INFO|MainThread|machine.py:167 - run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1576)
      2018-06-12 04:27:56,980|INFO|MainThread|machine.py:167 - run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
      2018-06-12 04:27:56,980|INFO|MainThread|machine.py:167 - run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1591)
      2018-06-12 04:27:56,980|INFO|MainThread|machine.py:167 - run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|at org.apache.hadoop.hbase.regionserver.wal.ReaderBase.init(ReaderBase.java:64)
      2018-06-12 04:27:56,981|INFO|MainThread|machine.py:167 - run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.init(ProtobufLogReader.java:165)
      2018-06-12 04:27:56,981|INFO|MainThread|machine.py:167 - run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:289)
      2018-06-12 04:27:56,981|INFO|MainThread|machine.py:167 - run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:271)
      2018-06-12 04:27:56,981|INFO|MainThread|machine.py:167 - run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:259)
      2018-06-12 04:27:56,981|INFO|MainThread|machine.py:167 - run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:395)
      2018-06-12 04:27:56,982|INFO|MainThread|machine.py:167 - run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|at org.apache.hadoop.hbase.wal.AbstractFSWALProvider.openReader(AbstractFSWALProvider.java:449)
      2018-06-12 04:27:56,982|INFO|MainThread|machine.py:167 - run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|at org.apache.hadoop.hbase.mapreduce.WALInputFormat$WALRecordReader.openReader(WALInputFormat.java:166)
      2018-06-12 04:27:56,982|INFO|MainThread|machine.py:167 - run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|at org.apache.hadoop.hbase.mapreduce.WALInputFormat$WALRecordReader.initialize(WALInputFormat.java:158)
      2018-06-12 04:27:56,983|INFO|MainThread|machine.py:167 - run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:560)
      2018-06-12 04:27:56,983|INFO|MainThread|machine.py:167 - run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:798)
      2018-06-12 04:27:56,983|INFO|MainThread|machine.py:167 - run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
      2018-06-12 04:27:56,983|INFO|MainThread|machine.py:167 - run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
      2018-06-12 04:27:56,983|INFO|MainThread|machine.py:167 - run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|at java.security.AccessController.doPrivileged(Native Method)
      2018-06-12 04:27:56,984|INFO|MainThread|machine.py:167 - run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|at javax.security.auth.Subject.doAs(Subject.java:422)
      2018-06-12 04:27:56,984|INFO|MainThread|machine.py:167 - run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1686)
      2018-06-12 04:27:56,984|INFO|MainThread|machine.py:167 - run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
      2018-06-12 04:27:56,984|INFO|MainThread|machine.py:167 - run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|
      

      But looks like it did find the file on hdfs, test script runs incremental backup command as HBase user.

      2018-06-12 04:27:30,756|INFO|MainThread|machine.py:167 - run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|2018-06-12 04:27:30,755 DEBUG [main] mapreduce.WALInputFormat: Scanning hdfs://ns1/apps/hbase/data/oldWALs/ctr-e138-1518143905142-359429-01-000004.hwx.site%2C16020%2C1528776085205.1528776160915 for WAL files
      2018-06-12 04:27:30,758|INFO|MainThread|machine.py:167 - run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|2018-06-12 04:27:30,758 INFO  [main] mapreduce.WALInputFormat: Found: HdfsLocatedFileStatus{path=hdfs://ns1/apps/hbase/data/oldWALs/ctr-e138-1518143905142-359429-01-000004.hwx.site%2C16020%2C1528776085205.1528776160915; isDirectory=false; length=18031; replication=3; blocksize=268435456; modification_time=1528776689363; access_time=1528776160921; owner=hbase; group=hdfs; permission=rwx--x--x; isSymlink=false; hasAcl=false; isEncrypted=false; isErasureCoded=false}
      

      Attachments

        1. HBASE-21077-v3.patch
          4 kB
          Vladimir Rodionov
        2. HBASE-21077-v2.patch
          1 kB
          Vladimir Rodionov
        3. HBASE-21077-v1.patch
          1 kB
          Vladimir Rodionov

        Issue Links

          Activity

            People

              vrodionov Vladimir Rodionov
              romil.choksi Romil Choksi
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: