Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-40513 SPIP: Support Docker Official Image for Spark
  3. SPARK-43365

Refactor Dockerfile and workflow based on base image

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.5.0
    • 3.5.0
    • Spark Docker
    • None

    Description

      https://github.com/docker-library/official-images/pull/13089?notification_referrer_id=NT_kwDOABp-orI0MzIwMzMwNzY5OjE3MzYzNTQ#issuecomment-1533540388

      Would it be useful to save space by sharing layers by having one image from another? 🤔 Something like the *java11-ubuntu as the "base" with r and python variants FROM that and the r-python being FROM, probably, the larger one of those?

      Rough example Dockerfiles

      FROM eclipse-temurin:11-jre-focal
      # user stuff, install common deps, etc
      ...
      # download/extract spark (maybe keeping python and R files too? they seem relatively small compared to the rest)
      
      # other images in separate Dockerfiles
      FROM spark:3.3.0-scala2.12-java11-ubuntu
      # get "/opt/spark/{python,R}/" contents if not kept in base
      # install python or R (and things like R_HOME)
      

      Attachments

        Activity

          People

            yikunkero Yikun Jiang
            yikunkero Yikun Jiang
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: