Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-30692

Mechanism to check that all queries of spark structured-streaming are started in case of multiple sink actions.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 2.3.2
    • None
    • Structured Streaming

    Description

      Get the Spark StructuredStreaming job status (start/stop) having multiple sink actions

      We are trying to get the status of StructuredStreaming job, below is the requirement

      We wanted to push data to a kafkatopic with offset value set to latest, we are using spark-listeners to get the job status, however we observed that listener is invoked because one of the spark query started but complete spark-job isn't actually started as other queries are still initializing, this results in data-loss because we pushed the data to kafka topic and kafka server set the offset inventory value to the latest, as complete spark job is not started yet but listener gets invoked, once spark job is started it didn't consume data from kafka as offset on kafka server has been already set to latest.

      Attachments

        Activity

          People

            Unassigned Unassigned
            amityadav2911@gmail.com Amit
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: