Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-3272

ParDoTranslatorTest: Error creating local cluster while creating checkpoint file

Details

    Description

      Failed build: https://builds.apache.org/job/beam_PostCommit_Java_MavenInstall/org.apache.beam$beam-runners-apex/5330/console

      Key output:

      2017-11-29T01:21:26.956 [ERROR] testAssertionFailure(org.apache.beam.runners.apex.translation.ParDoTranslatorTest)  Time elapsed: 2.007 s  <<< ERROR!
      java.lang.RuntimeException: Error creating local cluster
      	at org.apache.apex.engine.EmbeddedAppLauncherImpl.getController(EmbeddedAppLauncherImpl.java:122)
      	at org.apache.apex.engine.EmbeddedAppLauncherImpl.launchApp(EmbeddedAppLauncherImpl.java:71)
      	at org.apache.apex.engine.EmbeddedAppLauncherImpl.launchApp(EmbeddedAppLauncherImpl.java:46)
      	at org.apache.beam.runners.apex.ApexRunner.run(ApexRunner.java:197)
      	at org.apache.beam.runners.apex.TestApexRunner.run(TestApexRunner.java:57)
      	at org.apache.beam.runners.apex.TestApexRunner.run(TestApexRunner.java:31)
      	at org.apache.beam.sdk.Pipeline.run(Pipeline.java:304)
      	at org.apache.beam.sdk.Pipeline.run(Pipeline.java:290)
      	at org.apache.beam.runners.apex.translation.ParDoTranslatorTest.runExpectingAssertionFailure(ParDoTranslatorTest.java:156)
      

      ...

      Caused by: ExitCodeException exitCode=1: chmod: cannot access ‘/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Java_MavenInstall/src/runners/apex/target/com.datatorrent.stram.StramLocalCluster/checkpoints/2/_tmp’: No such file or directory
      
      	at org.apache.hadoop.util.Shell.runCommand(Shell.java:582)
      	at org.apache.hadoop.util.Shell.run(Shell.java:479)
      	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773)
      	at org.apache.hadoop.util.Shell.execCommand(Shell.java:866)
      	at org.apache.hadoop.util.Shell.execCommand(Shell.java:849)
      	at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:733)
      	at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:225)
      	at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:209)
      	at org.apache.hadoop.fs.RawLocalFileSystem.createOutputStreamWithMode(RawLocalFileSystem.java:307)
      	at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:296)
      	at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:328)
      	at org.apache.hadoop.fs.FileSystem.primitiveCreate(FileSystem.java:1017)
      	at org.apache.hadoop.fs.DelegateToFileSystem.createInternal(DelegateToFileSystem.java:99)
      	at org.apache.hadoop.fs.ChecksumFs$ChecksumFSOutputSummer.<init>(ChecksumFs.java:352)
      	at org.apache.hadoop.fs.ChecksumFs.createInternal(ChecksumFs.java:399)
      	at org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:584)
      	at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:686)
      	at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:682)
      	at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
      	at org.apache.hadoop.fs.FileContext.create(FileContext.java:688)
      	at com.datatorrent.common.util.AsyncFSStorageAgent.copyToHDFS(AsyncFSStorageAgent.java:119)
      	... 50 more
      

      By inspecting code at the stack frames, seems it's trying to copy an operator's checkpoint "to HDFS" (which in this case is the local disk), but fails while creating the target file of the copy - creation creates the file (successfully) and chmods it writable (unsuccessfully). Barring something subtle (e.g. chmod being not allowed to call immediately after creating a FileOutputStream), this looks like the whole directory was possibly deleted from under the process. I don't know why this would be the case though, or how to debug it.

      Either way, the path being accessed is funky: /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Java_MavenInstall/src/runners/apex/target/... - I think it'd be better if this test used a "@Rule TemporaryFolder" to store Apex checkpoints. I don't know whether the Apex runner allows that, but I can see how it could help reduce interference between tests and potentially resolve this issue.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jkff Eugene Kirpichov
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h