Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
4.2.0, 4.3.0, 5.0.0
-
None
-
- Cloudera Quick Start VM 5.12.1
- CDH Oozie 4.1 - actually it includes all 4.2, 4.3, 5.0 patches applied
- Spark 1.6, Spark 2.2
- /etc/oozie/oozie-site.xml
... added<property> <name>oozie.service.SparkConfigurationService.spark.configurations</name> <value>*=/etc/spark/conf</value> </property>
- /etc/spark/conf/spark-defaults.conf
... addedspark.hadoop.mapreduce.application.classpath= spark.hadoop.yarn.application.classpath=
- workflow.xml
<workflow-app name="My Workflow" xmlns="uri:oozie:workflow:0.5"> <start to="spark-0ff5"/> <kill name="Kill"> <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <action name="spark-0ff5"> <spark xmlns="uri:oozie:spark-action:0.2"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <master>yarn</master> <mode>cluster</mode> <name>MySpark</name> <class>org.apache.oozie.example.SparkFileCopy</class> <jar>oozie-examples-4.3.0.jar</jar> <arg>/user/cloudera/spark-oozie/input</arg> <arg>/user/cloudera/spark-oozie/output</arg> <file>/user/cloudera/spark-oozie-examples/oozie-examples-4.3.0.jar#oozie-examples-4.3.0.jar</file> </spark> <ok to="End"/> <error to="Kill"/> </action> <end name="End"/> </workflow-app>
Cloudera Quick Start VM 5.12.1 CDH Oozie 4.1 - actually it includes all 4.2, 4.3, 5.0 patches applied Spark 1.6, Spark 2.2 /etc/oozie/oozie-site.xml ... added <property> <name> oozie.service.SparkConfigurationService.spark.configurations </name> <value> *=/etc/spark/conf </value> </property> /etc/spark/conf/spark-defaults.conf ... added spark.hadoop.mapreduce.application.classpath= spark.hadoop.yarn.application.classpath= workflow.xml <workflow-app name= "My Workflow" xmlns= "uri:oozie:workflow:0.5" > <start to= "spark-0ff5" /> <kill name= "Kill" > <message> Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}] </message> </kill> <action name= "spark-0ff5" > <spark xmlns= "uri:oozie:spark-action:0.2" > <job-tracker> ${jobTracker} </job-tracker> <name-node> ${nameNode} </name-node> <master> yarn </master> <mode> cluster </mode> <name> MySpark </name> <class> org.apache.oozie.example.SparkFileCopy </class> <jar> oozie-examples-4.3.0.jar </jar> <arg> /user/cloudera/spark-oozie/input </arg> <arg> /user/cloudera/spark-oozie/output </arg> <file> /user/cloudera/spark-oozie-examples/oozie-examples-4.3.0.jar#oozie-examples-4.3.0.jar </file> </spark> <ok to= "End" /> <error to= "Kill" /> </action> <end name= "End" /> </workflow-app>
Description
Currently CDH distribution of Hadoop configures Spark2 with empty spark.hadoop.mapreduce.application.classpath, spark.hadoop.yarn.application.classpath properties in /etc/spark2/conf/spark-defaults.properties
spark.hadoop.mapreduce.application.classpath= spark.hadoop.yarn.application.classpath=
The motivation for such a configuration is described in Spark2 Parcel installation scripts (SPARK2_ON_YARN-2.2.0.cloudera1.jar!/scripts/common.sh)
# Override the YARN / MR classpath configs since we already include them when generating # SPARK_DIST_CLASSPATH. This avoids having the same paths added to the classpath a second # time and wasting file descriptors. replace_spark_conf "spark.hadoop.mapreduce.application.classpath" "" "$SPARK_DEFAULTS" replace_spark_conf "spark.hadoop.yarn.application.classpath" "" "$SPARK_DEFAULTS"
So when configuring Oozie to run with spark-defaults.properties from Spark2, Oozie's actions usually throw an exception
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration at java.lang.Class.getDeclaredMethods0(Native Method) at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) at java.lang.Class.privateGetMethodRecursive(Class.java:3048) at java.lang.Class.getMethod0(Class.java:3018) at java.lang.Class.getMethod(Class.java:1784) at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544) at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
The same exception is also thrown in case of Spark 1.6 if there are empty spark.hadoop.mapreduce.application.classpath, spark.hadoop.yarn.application.classpath in /etc/spark/conf/spark-defaults.properties
Probably OOZIE-2547 is the cause of this issue.
Oozie Launcher logs with empty spark.hadoop.mapreduce.application.classpath, spark.hadoop.yarn.application.classpath which cause the job to fail are in: spark-empty-cp.stderr and spark-empty-cp.stdout
Oozie Launcher logs without spark.hadoop.mapreduce.application.classpath, spark.hadoop.yarn.application.classpath are in: spark-nonempty-cp.stderr and spark-nonempty-cp.stdout