Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-28651

Streaming file source doesn't change the schema to nullable automatically

    XMLWordPrintableJSON

Details

    • Hide
      All fields of the Structured Streaming's file source schema will be forced to be nullable since Spark 3.0.0. This protects users from corruptions when the specified or inferred schema is not compatible with actual data. If you would like the original behavior, you can set the SQL conf "spark.sql.streaming.fileSource.schema.forceNullable" to "false". This flag is added to reduce the migration work when upgrading to Spark 3.0.0 and will be removed in future. Please update your codes to work with the new behavior as soon as possible.
      Show
      All fields of the Structured Streaming's file source schema will be forced to be nullable since Spark 3.0.0. This protects users from corruptions when the specified or inferred schema is not compatible with actual data. If you would like the original behavior, you can set the SQL conf "spark.sql.streaming.fileSource.schema.forceNullable" to "false". This flag is added to reduce the migration work when upgrading to Spark 3.0.0 and will be removed in future. Please update your codes to work with the new behavior as soon as possible.

    Description

      Right now, batch DataFrame always changes the schema to nullable automatically (See this line: https://github.com/apache/spark/blob/325bc8e9c6187a96b33a033fbb0145dfca619135/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L399).

      However, streaming DataFrame's schema is read in this line https://github.com/apache/spark/blob/325bc8e9c6187a96b33a033fbb0145dfca619135/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L259 which doesn't change the schema to nullable automatically.

      We should make streaming DataFrame consistent with batch.

      It can cause corrupted parquet files due to the schema mismatch.

      Attachments

        Issue Links

          Activity

            People

              zsxwing Shixiong Zhu
              tom.magdanski Tomasz Magdanski
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: