Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
2.6.5
-
None
-
None
Description
Should for some reason from the DNS registry one of the ResourceManager host's would be missing, the HA configuration of the ClientProxy is not fault tolerant enough to survive this.
To ensure that even in the face of DNS resolution issues, when at least one of the RMs can be resolved, then allow the tokenService call to succeed. This can be seen at:
https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ClientRMProxy.java#L153
We can safely assume if one of the RMs is missing from DNS, they can't be the active one anyways, so clients jobs can still be submitted while people fix the DNS issues.
A sample exception when one of the entries are missing:
17/11/02 18:20:35 INFO service.AbstractService: Service org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl failed in state STARTED; cause: java.lang.IllegalArgumentException: java.net.UnknownHostException: some.dns.entry at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:374) at org.apache.hadoop.yarn.client.ClientRMProxy.getTokenService(ClientRMProxy.java:153) at org.apache.hadoop.yarn.client.ClientRMProxy.getAMRMTokenService(ClientRMProxy.java:138) at org.apache.hadoop.yarn.client.ClientRMProxy.setAMRMTokenService(ClientRMProxy.java:80) at org.apache.hadoop.yarn.client.ClientRMProxy.getRMAddress(ClientRMProxy.java:99) at org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider.getProxyInternal(ConfiguredRMFailoverProxyProvider.java:76) at org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider.getProxy(ConfiguredRMFailoverProxyProvider.java:90) at org.apache.hadoop.io.retry.RetryInvocationHandler.<init>(RetryInvocationHandler.java:75) at org.apache.hadoop.io.retry.RetryInvocationHandler.<init>(RetryInvocationHandler.java:66) at org.apache.hadoop.io.retry.RetryProxy.create(RetryProxy.java:58) at org.apache.hadoop.yarn.client.RMProxy.createRMProxy(RMProxy.java:95) at org.apache.hadoop.yarn.client.ClientRMProxy.createRMProxy(ClientRMProxy.java:72) at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.serviceStart(AMRMClientImpl.java:186) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.spark.deploy.yarn.YarnRMClient.register(YarnRMClient.scala:65) at org.apache.spark.deploy.yarn.ApplicationMaster.registerAM(ApplicationMaster.scala:359) at org.apache.spark.deploy.yarn.ApplicationMaster.runExecutorLauncher(ApplicationMaster.scala:435) at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:256) at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:774) at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:67) at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:66) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66) at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:772) at org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:795) at org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala) Caused by: java.net.UnknownHostException: some.dns.entry ... 28 more
Attachments
Issue Links
- relates to
-
ZOOKEEPER-1576 Zookeeper cluster - failed to connect to cluster if one of the provided IPs causes java.net.UnknownHostException
- Resolved