Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Incomplete
-
2.4.4
-
None
Description
When submitting a job with the application jar in an HA HDFS and with the HDFS configuration available to both the driver and the executors at $HADOOP_CONF_DIR, the executor can't fetch the application jar.
For example with Kubernetes:
- Create a Spark image with the HA HDFS configuration files available at $HADOOP_CONF_DIR.
- Push the application jar to the HA HDFS.
- Use spark-submit to create the spark job in the cluster
spark-submit \ --master k8s://https://kubernetes.example:6443 \ --deploy-mode cluster \ --name spark_hdfs_test \ --class $CLASS \ --conf spark.executor.instances=3 \ --conf spark.kubernetes.container.image=$SPARK_IMAGE \ --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \ hdfs:///jars/application.jar
On the driver, all goes well, but the following error shows on the log of all executors:
... 19/11/20 12:45:43 INFO Executor: Fetching hdfs://hdfs-k8s/jars/application.jar with timestamp 1574253925510 19/11/20 12:45:43 ERROR Executor: Exception in task 0.1 in stage 0.0 (TID 1) java.lang.IllegalArgumentException: java.net.UnknownHostException: hdfs-k8s at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:378) at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:310) at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176) at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:678) at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:619) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373) at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1866) at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:721) at org.apache.spark.util.Utils$.fetchFile(Utils.scala:496) at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:811) at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:803) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40) at scala.collection.mutable.HashMap.foreach(HashMap.scala:130) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732) at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:803) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:375) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.net.UnknownHostException: hdfs-k8s ... 28 more
The traceback suggests that when the executor wants to fetch the application jar, it does not understand that the path corresponds to an HA HDFS. (Which it should as the HDFS HA configuration is available).
However when the path to the application jar is set with the address to the active namenode, then all works well. Even the code in the jar which itself uses HA HDFS (hdfs:///some-file.txt).
// code placeholder spark-submit \ --master k8s://https://kubernetes.example:6443 \ --deploy-mode cluster \ --name spark_hdfs_test \ --class $CLASS \ --conf spark.executor.instances=3 \ --conf spark.kubernetes.container.image=$SPARK_IMAGE \ --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \ hdfs://hdfs-namenode-1.hdfs-namenode.default.svc.cluster.local:8020/jars/application.jar
Attachments
Issue Links
- duplicates
-
SPARK-28992 Support update dependencies from hdfs when task run on executor pods
- In Progress