Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
4.2.0
-
None
-
None
-
Oozie 4.2.0.2.3.4.0-3485
Spark 1.4.1
Scala 2.10.5
HDP 2.3
Description
The spark action does not appear to use the jobTracker setting in job.properties (or in the yarn config) when creating the SparkContext. When jobTracker property is set to use myDomain:8050 (to match the yarn.resourcemanager.address setting), I can see in the oozie UI (click on job > action > action configuration) that myDomain:8050 is being submitted but when I drill down into the hadoop job history logs I see the error indicating that a default 0.0.0.0:8032 is being used:
job.properties
nameNode=hdfs://myDomain:8020 jobTracker=myOtherDomain:8050 queueName=default master=yarn # have also tried yarn-cluster and yarn-client oozie.use.system.libpath=true oozie.wf.application.path=${nameNode}/bmp/ oozie.action.sharelib.for.spark=spark2 # I've added the updated spark libs I need in here
workflow
<workflow-app xmlns='uri:oozie:workflow:0.5' name='MyWorkflow'> <start to='spark-node' /> <action name='spark-node'> <spark xmlns="uri:oozie:spark-action:0.1"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <prepare> <delete path="${nameNode}/bmp/output"/> </prepare> <master>${master}</master> <name>My Workflow</name> <class>uk.co.bmp.drivers.MyDriver</class> <jar>${nameNode}/bmp/lib/bmp.spark-assembly-1.0.jar</jar> <spark-opts>--conf spark.yarn.historyServer.address=http://myDomain:18088 --conf spark.eventLog.dir=hdfs://myDomain/user/spark/applicationHistory --conf spark.eventLog.enabled=true</spark-opts> <arg>${nameNode}/bmp/input/input_file.csv</arg> </spark> <ok to="end" /> <error to="fail" /> </action> <kill name="fail"> <message>Workflow failed, error message[${wf:errorMessage(wf:lastErrorNode())}] </message> </kill> <end name='end' /> </workflow-app>
Error
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception,Call From myDomain/ipAddress to 0.0.0.0:8032 failed on connection exception: java.net.ConnectException: Connection refused. For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
...
at org.apache.spark.SparkContext.<init>(SparkContext.scala:497)
...
Where is it pulling 8032 from? Why does it not use the port configured in the job.properties?