[SPARK-5152] Let metrics.properties file take an hdfs:// path - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: 1.2.0
Fix Version/s: None
Component/s: Spark Core
Labels:
- bulk-closed

Description

From my reading of the code, the spark.metrics.conf property must be a path that is resolvable on the local filesystem of each executor.

Running a Spark job with --conf spark.metrics.conf=hdfs://host1.domain.com/path/metrics.properties logs many errors (~1 per executor, presumably?) like:

15/01/08 13:20:57 ERROR metrics.MetricsConfig: Error loading configure file
java.io.FileNotFoundException: hdfs:/host1.domain.com/path/metrics.properties (No such file or directory)
        at java.io.FileInputStream.open(Native Method)
        at java.io.FileInputStream.<init>(FileInputStream.java:146)
        at java.io.FileInputStream.<init>(FileInputStream.java:101)
        at org.apache.spark.metrics.MetricsConfig.initialize(MetricsConfig.scala:53)
        at org.apache.spark.metrics.MetricsSystem.<init>(MetricsSystem.scala:92)
        at org.apache.spark.metrics.MetricsSystem$.createMetricsSystem(MetricsSystem.scala:218)
        at org.apache.spark.SparkEnv$.create(SparkEnv.scala:329)
        at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:181)
        at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:131)
        at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:61)
        at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:60)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
        at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:60)
        at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:113)
        at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:163)
        at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)

which seems consistent with the idea that it's looking on the local filesystem and not parsing the "scheme" portion of the URL.

Letting all executors get their metrics.properties files from one location on HDFS would be an improvement, right?

Attachments

Issue Links

relates to

SPARK-7169 Allow to specify metrics configuration more flexibly

Closed

links to

[Github] Pull Request #21709 (jzhuge)

GitHub Pull Request #21709

Activity

People

Assignee:: Unassigned

Reporter:: Ryan Williams

Votes:: 4 Vote for this issue

Watchers:: 13 Start watching this issue

Dates

Created:: 08/Jan/15 19:03

Updated:: 25/May/21 01:54

Resolved:: 25/May/21 01:42