[SPARK-28651] Streaming file source doesn't change the schema to nullable automatically - ASF JIRA

XML

Word

Printable

JSON

Docs Text:

Hide
All fields of the Structured Streaming's file source schema will be forced to be nullable since Spark 3.0.0. This protects users from corruptions when the specified or inferred schema is not compatible with actual data. If you would like the original behavior, you can set the SQL conf "spark.sql.streaming.fileSource.schema.forceNullable" to "false". This flag is added to reduce the migration work when upgrading to Spark 3.0.0 and will be removed in future. Please update your codes to work with the new behavior as soon as possible.

Show
All fields of the Structured Streaming's file source schema will be forced to be nullable since Spark 3.0.0. This protects users from corruptions when the specified or inferred schema is not compatible with actual data. If you would like the original behavior, you can set the SQL conf "spark.sql.streaming.fileSource.schema.forceNullable" to "false". This flag is added to reduce the migration work when upgrading to Spark 3.0.0 and will be removed in future. Please update your codes to work with the new behavior as soon as possible.

We should make streaming DataFrame consistent with batch.

It can cause corrupted parquet files due to the schema mismatch.

links to

GitHub Pull Request #25382