Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Not A Problem
-
2.4.5, 3.0.0
-
None
-
None
Description
I am currently trying to follow the k8s instructions for Spark: https://spark.apache.org/docs/latest/running-on-kubernetes.html and when I clone apache/spark on GitHub on the master branch I saw multiple wrong folder references after trying to build my Docker image:
Issue 1: The comments in the Dockerfile reference the wrong folder for the Dockerfile:
# If this docker file is being used in the context of building your images from a Spark # distribution, the docker build command should be invoked from the top level directory # of the Spark distribution. E.g.: # docker build -t spark:latest -f kubernetes/dockerfiles/spark/Dockerfile .
Well that docker build command simply won't run. I only got the following to run:
docker build -t spark:latest -f resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile .
which is the actual path to the Dockerfile.
Issue 2: jars folder does not exist
After I read the tutorial I of course build spark first as per the instructions with:
./build/mvn -Pkubernetes -DskipTests clean package
Nonetheless, in the Dockerfile I get this error when building:
Step 5/18 : COPY jars /opt/spark/jars
COPY failed: stat /var/lib/docker/tmp/docker-builder402673637/jars: no such file or directory
for which I may have found a similar issue here: https://stackoverflow.com/questions/52451538/spark-for-kubernetes-test-on-mac
I am new to Spark but I assume that this jars folder - if the build step would actually make it and I ran the maven build of the master branch successfully with the command I mentioned above - would exist in the root folder of the project. Turns out it's here:
spark/assembly/target/scala-2.12/jars
Issue 3: missing entrypoint.sh and decom.sh due to wrong reference
While Issue 2 remains unresolved as I can't wrap my head around the missing jars folder (bin and sbin got copied successfully after I made a dummy jars folder) I then got stuck on these 2 steps:
COPY kubernetes/dockerfiles/spark/entrypoint.sh /opt/ COPY kubernetes/dockerfiles/spark/decom.sh /opt/
with:
Step 8/18 : COPY kubernetes/dockerfiles/spark/entrypoint.sh /opt/
COPY failed: stat /var/lib/docker/tmp/docker-builder638219776/kubernetes/dockerfiles/spark/entrypoint.sh: no such file or directory
which makes sense since the path should actually be:
resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh
resource-managers/kubernetes/docker/src/main/dockerfiles/spark/decom.sh
Issue 4: /tests/ has been renamed in /integration-tests/
**And the location is wrong.
COPY kubernetes/tests /opt/spark/tests
has to be changed to:
COPY resource-managers/kubernetes/integration-tests /opt/spark/tests
Remark
I only created one issue since this seems like somebody cleaned up the repo and forgot to change these. Am I missing something here? If I am, I apologise in advance since I am new to the Spark project. I also saw that some of these references were handled through vars in previous branches: https://github.com/apache/spark/blob/branch-2.4/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile (e.g. 2.4) but that also does not run out of the box.
I am also really not sure about the affected versions since that was not transparent enough for me on GH - feel free to edit that field
Thanks in advance!
Attachments
Issue Links
- links to