Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23843

Deploy yarn meets incorrect LOCALIZED_CONF_DIR

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Invalid
    • 2.3.0
    • None
    • Deploy
    • None
    • spark-2.3.0-bin-hadoop2.7

    • Important

    Description

      We have implement a new Hadoop-compatible filesystem and run spark on it. The commands is:

      ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster --executor-memory 1G --num-executors 1 /home/hadoop/app/spark-2.3.0-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.3.0.jar 10

      The result is:

      Exception in thread "main" org.apache.spark.SparkException: Application application_1522399820301_0020 finishe
      d with failed status
      at org.apache.spark.deploy.yarn.Client.run(Client.scala:1159)

      We set log level to DEBUG and find:

      2018-04-02 09:36:09,603 DEBUG org.apache.spark.deploy.yarn.Client: _app_.jar -> resource { scheme: "dfs" host: "f-63a47d43wh98.cn-neimeng-env10-d01.dfs.aliyuncs.com" port: 10290 file: "/user/hadoop/.sparkStaging/application_1522399820301_0006/spark-examples_2.11-2.3.0.jar" } size: 1997548 timestamp: 1522632978000 type: FILE visibility: PRIVATE
      2018-04-02 09:36:09,603 DEBUG org.apache.spark.deploy.yarn.Client: _spark_libs_ -> resource { scheme: "dfs" host: "f-63a47d43wh98.cn-neimeng-env10-d01.dfs.aliyuncs.com" port: 10290 file: "/user/hadoop/.sparkStaging/application_1522399820301_0006/_spark_libs_924155631753698276.zip" } size: 242801307 timestamp: 1522632977000 type: ARCHIVE visibility: PRIVATE
      2018-04-02 09:36:09,603 DEBUG org.apache.spark.deploy.yarn.Client: _spark_conf_ -> resource { port: -1 file: "/user/hadoop/.sparkStaging/application_1522399820301_0006/_spark_conf_.zip" } size: 185531 timestamp: 1522632978000 type: ARCHIVE visibility: PRIVATE

      As shown, _app.jar and spark_libs_ ‘s information are all correct. BUT _spark_conf_ has no port, scheme.

      We explore the source code, addResource appears two times in Client.scala

      val destPath = copyFileToRemote(destDir, localPath, replication, symlinkCache)
      val destFs = FileSystem.get(destPath.toUri(), hadoopConf)
      distCacheMgr.addResource(
      destFs, hadoopConf, destPath, localResources, resType, linkname, statCache,
      appMasterOnly = appMasterOnly)
      
       
      val remoteConfArchivePath = new Path(destDir, LOCALIZED_CONF_ARCHIVE) val remoteFs = FileSystem.get(remoteConfArchivePath.toUri(), hadoopConf) sparkConf.set(CACHED_CONF_ARCHIVE, remoteConfArchivePath.toString()) val localConfArchive = new Path(createConfArchive().toURI()) copyFileToRemote(destDir, localConfArchive, replication, symlinkCache, force = true, destName = Some(LOCALIZED_CONF_ARCHIVE)) // Manually add the config archive to the cache manager so that the AM is launched with // the proper files set up. 
      distCacheMgr.addResource( remoteFs, hadoopConf, remoteConfArchivePath, localResources, LocalResourceType.ARCHIVE, LOCALIZED_CONF_DIR, statCache, appMasterOnly = false)
      

      As shown in the source code, the destPaths are differently constructed. And this is confirmed by self added debug log

      2018-04-02 15:18:46,357 ERROR org.apache.hadoop.yarn.util.ConverterUtils: getYarnUrlFromURI URI:/user/root/.sparkStaging/application_1522399820301_0020/_spark_conf_.zip
      2018-04-02 15:18:46,357 ERROR org.apache.hadoop.yarn.util.ConverterUtils: getYarnUrlFromURI URL:null; null;-1;null;/user/root/.sparkStaging/application_1522399820301_0020/_spark_conf_.zip

      Log messages on YARN NM:

      2018-04-02 09:36:11,958 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Failed to parse resource-request
      java.net.URISyntaxException: Expected scheme name at index 0: :///user/hadoop/.sparkStaging/application_1522399820301_0006/_spark_conf_.zip

      Attachments

        Activity

          People

            Unassigned Unassigned
            zhoutai.zt zhoutai.zt
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: