Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-47362

Enhance the console sink to provide watermark and state information

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 4.0.0
    • None
    • Structured Streaming
    • None

    Description

      As discussed in the dev mailing list, we should enhance the console sink for Structured Streaming to additionally provide information about:

      • The stream's watermark at the end of the batch
      • The rows in state at the end of each batch

      This will be enabled via an `option` on the sink. Since both of these additions are for stateful queries only, the option will not affect stateless queries. To make parsing the output easier, timestamps will be duration-rendered (i.e. "1 second" instead of the ISO 8601 extened timestamp). For joins, just the KeyWithIndexToValue will be shown. If there are multiple stateful operators, we'll print a state table for each.

      These considerations are up to discussion either in this thread or in the PR.

      Attachments

        Activity

          People

            Unassigned Unassigned
            neilramaswamy Neil Ramaswamy
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - 168h
                168h
                Remaining:
                Remaining Estimate - 168h
                168h
                Logged:
                Time Spent - Not Specified
                Not Specified