Details
-
Bug
-
Status: Patch Available
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
Important
Description
When getting a Job via the Cluster API, it is not correctly configured.
To reproduce this:
- Submit a MR job, and set some arbitrary parameter to its configuration
job.getConfiguration().set("foo", "bar"); job.setJobName("foo-bug-demo");
- Get the job in a client:
final Cluster c = new Cluster(conf); final JobStatus[] statuses = c.getAllJobStatuses(); final JobStatus s = ... // get the status for the job named foo-bug-demo final Job j = c.getJob(s.getJobId()); final Configuration conf = job.getConfiguration();
- Get its "foo" entry
final String s = conf.get("foo");
- Expected: s is "bar"; But: s is null.
The reason is that the job's configuration is stored on HDFS (the Configuration has a resource with a hdfs:// URL) and in the loadResource it is changed to a path on the local file system (hdfs://host.domain:port/tmp/hadoop-yarn/... is changed to /tmp/hadoop-yarn/...), which does not exist, and thus the configuration is not populated.
The bug happens in the Cluster class, where JobConfs are created from status.getJobFile(). A quick fix would be to copy this job file to a temporary file in the local file system and populate the JobConf from this file.