Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-28784

StreamExecution and StreamingQueryManager should utilize CheckpointFileManager to interact with checkpoint directories

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.4.0
    • Fix Version/s: None
    • Component/s: Spark Core
    • Labels:
      None

      Description

      After PR https://github.com/apache/spark/pull/21048, the CheckpointFileManager interface was created to handle all structured streaming checkpointing operations and helps users to choose how they wish to write checkpointing files atomically.
      StreamExecution and StreamingQueryManager still uses some FileSystem operations without using the CheckpointFileManager.
      For instance,
      https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala#L137
      https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala#L392

      Instead, StreamExecution and StreamingQueryManager should use CheckpointFileManager for these operations.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                shrutig Shruti Gumma
                Reporter:
                shrutig Shruti Gumma
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated: