Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-15058 [Umbrella] Current Flaky Tests
  3. HIVE-19562

Flaky test: TestMiniSparkOnYarn FileNotFoundException in spark-submit

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 3.1.0, 4.0.0-alpha-1
    • Spark
    • None

    Description

      Seeing sporadic failures during test setup. Specifically, when spark-submit runs this error (or a similar error) gets thrown:

      2018-05-15T10:55:02,112  INFO [RemoteDriver-stderr-redir-27d3dcfb-2a10-4118-9fae-c200d2e095a5 main] client.SparkSubmitSparkClient: Exception in thread "main" java.io.FileNotFoundException: File file:/tmp/spark-56e217f7-b8a5-4c63-9a6b-d737a64f2820/__spark_libs__7371510645900072447.zip does not exist
      2018-05-15T10:55:02,113  INFO [RemoteDriver-stderr-redir-27d3dcfb-2a10-4118-9fae-c200d2e095a5 main] client.SparkSubmitSparkClient:      at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:641)
      2018-05-15T10:55:02,113  INFO [RemoteDriver-stderr-redir-27d3dcfb-2a10-4118-9fae-c200d2e095a5 main] client.SparkSubmitSparkClient:      at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:867)
      2018-05-15T10:55:02,113  INFO [RemoteDriver-stderr-redir-27d3dcfb-2a10-4118-9fae-c200d2e095a5 main] client.SparkSubmitSparkClient:      at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:631)
      2018-05-15T10:55:02,113  INFO [RemoteDriver-stderr-redir-27d3dcfb-2a10-4118-9fae-c200d2e095a5 main] client.SparkSubmitSparkClient:      at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442)
      2018-05-15T10:55:02,113  INFO [RemoteDriver-stderr-redir-27d3dcfb-2a10-4118-9fae-c200d2e095a5 main] client.SparkSubmitSparkClient:      at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:365)
      2018-05-15T10:55:02,113  INFO [RemoteDriver-stderr-redir-27d3dcfb-2a10-4118-9fae-c200d2e095a5 main] client.SparkSubmitSparkClient:      at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:316)
      2018-05-15T10:55:02,113  INFO [RemoteDriver-stderr-redir-27d3dcfb-2a10-4118-9fae-c200d2e095a5 main] client.SparkSubmitSparkClient:      at org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:356)
      2018-05-15T10:55:02,113  INFO [RemoteDriver-stderr-redir-27d3dcfb-2a10-4118-9fae-c200d2e095a5 main] client.SparkSubmitSparkClient:      at org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:478)
      2018-05-15T10:55:02,113  INFO [RemoteDriver-stderr-redir-27d3dcfb-2a10-4118-9fae-c200d2e095a5 main] client.SparkSubmitSparkClient:      at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:565)
      2018-05-15T10:55:02,113  INFO [RemoteDriver-stderr-redir-27d3dcfb-2a10-4118-9fae-c200d2e095a5 main] client.SparkSubmitSparkClient:      at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:863)
      2018-05-15T10:55:02,113  INFO [RemoteDriver-stderr-redir-27d3dcfb-2a10-4118-9fae-c200d2e095a5 main] client.SparkSubmitSparkClient:      at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:169)
      2018-05-15T10:55:02,113  INFO [RemoteDriver-stderr-redir-27d3dcfb-2a10-4118-9fae-c200d2e095a5 main] client.SparkSubmitSparkClient:      at org.apache.spark.deploy.yarn.Client.run(Client.scala:1146)
      2018-05-15T10:55:02,113  INFO [RemoteDriver-stderr-redir-27d3dcfb-2a10-4118-9fae-c200d2e095a5 main] client.SparkSubmitSparkClient:      at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1518)
      2018-05-15T10:55:02,113  INFO [RemoteDriver-stderr-redir-27d3dcfb-2a10-4118-9fae-c200d2e095a5 main] client.SparkSubmitSparkClient:      at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:879)
      2018-05-15T10:55:02,113  INFO [RemoteDriver-stderr-redir-27d3dcfb-2a10-4118-9fae-c200d2e095a5 main] client.SparkSubmitSparkClient:      at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
      2018-05-15T10:55:02,113  INFO [RemoteDriver-stderr-redir-27d3dcfb-2a10-4118-9fae-c200d2e095a5 main] client.SparkSubmitSparkClient:      at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
      2018-05-15T10:55:02,113  INFO [RemoteDriver-stderr-redir-27d3dcfb-2a10-4118-9fae-c200d2e095a5 main] client.SparkSubmitSparkClient:      at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
      2018-05-15T10:55:02,113  INFO [RemoteDriver-stderr-redir-27d3dcfb-2a10-4118-9fae-c200d2e095a5 main] client.SparkSubmitSparkClient:      at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
      

      Essentially, Spark is writing some files for container localization to a tmp dir, and that tmp dir is getting deleted. We have seen a lot of issues with writing files to /tmp/ in the past, so its probably best to write these files to a test-specific dir.

      Attachments

        1. HIVE-19562.4.patch
          2 kB
          Sahil Takiar
        2. HIVE-19562.3.patch
          2 kB
          Sahil Takiar
        3. HIVE-19562.1.patch
          2 kB
          Sahil Takiar

        Activity

          People

            stakiar Sahil Takiar
            stakiar Sahil Takiar
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: