Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-45672

Provide a unified user-facing schema for state format versions in state data source - reader

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 4.0.0
    • None
    • Structured Streaming
    • None

    Description

      As of now, except stream-stream join with joinSide option being specified, state data source would provide the state "as it is" in the state store. This means state data source will provide the different schema for operators having multiple state format versions.

      From users' perspective, they do not care about the state format version, hence may be confused if the state data source produces different schema.

      That said, we could probably consider defining and providing the same user facing schema for each operator.

      Note that this would need further discussion before coming up with code, because there is a clear trade-off. It makes a strong coupling between state data source and the implementation of stateful operators. Also, for the argument of non-predictable output schema, users can call printSchema() to see the output schema in prior to query.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              kabhwan Jungtaek Lim
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: