Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-37537

Spark 3.2.0 driver pod does not mount checkpoint filesystem from Kubernetes PVC

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete CommentsDelete
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Not A Problem
    • 3.2.0
    • None
    • Spark Submit
    • None

    Description

      I have Spark 3.2.0 driver executing in Kubernetes pod in client mode and following configs has been defined in spark-submit:

      --deploy-mode client
      --conf spark.kubernetes.driver.volumes.persistentVolumeClaim.glustervol.mount.path=/mnt/distributedDisk
      --conf spark.kubernetes.driver.volumes.persistentVolumeClaim.glustervol.readOnly=false
      --conf spark.kubernetes.driver.volumes.persistentVolumeClaim.glustervol.options.claimName=lolastreamingapp-conf spark.kubernetes.executor.volumes.persistentVolumeClaim.glustervol.mount.path=/mnt/distributedDisk
      --conf spark.kubernetes.executor.volumes.persistentVolumeClaim.glustervol.readOnly=false
      --conf spark.kubernetes.executor.volumes.persistentVolumeClaim.glustervol.options.claimName=lolastreamingapp
        

      I face a problem when starting the driver pod that it cannot access the filesystem mounted from GlusterFS PVC. I can see that driver pod has not mounted the PVC when describing the pod. I can also see that PVC is not mounted when describing the PVC.

      This has been working with Spark version 2.4.x, but not with Spark 3.2.0.

      Only notable change we have between using Spark version 2.4.x and 3.2.0 is that in 2.4.x we used deploy-mode cluster and in 3.2.0 we use deploy-mode client.

       

      Because the filesystem used for checkpointing is not mounted properly, we get following kind of error in our application:

      java.io.FileNotFoundException: File /mnt/distributedDisk/SE/LolaStreamingApp/1.0.0/1468589949 does not exist
              at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:779) ~[hadoop-client-api-3.3.1.jar:?]
              at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:1100) ~[hadoop-client-api-3.3.1.jar:?]
              at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:769) ~[hadoop-client-api-3.3.1.jar:?]
              at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:462) ~[hadoop-client-api-3.3.1.jar:?]
              at org.apache.spark.streaming.StreamingContext.checkpoint(StreamingContext.scala:240) ~[spark-streaming_2.12-3.2.0.jar:3.2.0]
              at org.apache.spark.streaming.api.java.JavaStreamingContext.checkpoint(JavaStreamingContext.scala:509) ~[spark-streaming_2.12-3.2.0.jar:3.2.0] 

       

       

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned Assign to me
            Silen Petri
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment