Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-5152

Let metrics.properties file take an hdfs:// path

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 1.2.0
    • None
    • Spark Core

    Description

      From my reading of the code, the spark.metrics.conf property must be a path that is resolvable on the local filesystem of each executor.

      Running a Spark job with --conf spark.metrics.conf=hdfs://host1.domain.com/path/metrics.properties logs many errors (~1 per executor, presumably?) like:

      15/01/08 13:20:57 ERROR metrics.MetricsConfig: Error loading configure file
      java.io.FileNotFoundException: hdfs:/host1.domain.com/path/metrics.properties (No such file or directory)
              at java.io.FileInputStream.open(Native Method)
              at java.io.FileInputStream.<init>(FileInputStream.java:146)
              at java.io.FileInputStream.<init>(FileInputStream.java:101)
              at org.apache.spark.metrics.MetricsConfig.initialize(MetricsConfig.scala:53)
              at org.apache.spark.metrics.MetricsSystem.<init>(MetricsSystem.scala:92)
              at org.apache.spark.metrics.MetricsSystem$.createMetricsSystem(MetricsSystem.scala:218)
              at org.apache.spark.SparkEnv$.create(SparkEnv.scala:329)
              at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:181)
              at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:131)
              at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:61)
              at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:60)
              at java.security.AccessController.doPrivileged(Native Method)
              at javax.security.auth.Subject.doAs(Subject.java:415)
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
              at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:60)
              at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:113)
              at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:163)
              at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
      

      which seems consistent with the idea that it's looking on the local filesystem and not parsing the "scheme" portion of the URL.

      Letting all executors get their metrics.properties files from one location on HDFS would be an improvement, right?

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              rdub Ryan Williams
              Votes:
              4 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: