Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-40630

Both SparkSQL and DataFrame insert invalid DATE/TIMESTAMP as NULL

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Invalid
    • 3.2.1
    • None
    • Spark Shell, SQL
    • None

    Description

      Describe the bug

      When we construct a DataFrame with an invalid DATE/TIMESTAMP (e.g. 1969-12-31 23:59:59 B) via spark-shell, or insert an invalid DATE/TIMESTAMP into a table via spark-sql, both interfaces unexpectedly evaluate the invalid value to NULL, instead of throwing an exception.

      To Reproduce

      On Spark 3.2.1 (commit 4f25b3f712), using spark-sql:

      $SPARK_HOME/bin/spark-sql

      Execute the following:

      spark-sql> create table timestamp_vals(c1 TIMESTAMP) stored as ORC;
      spark-sql> insert into timestamp_vals select cast(" 1969-12-31 23:59:59 B "as timestamp);
      spark-sql> select * from timestamp_vals;
      NULL

       
      Using spark-shell:

      $SPARK_HOME/bin/spark-shell

       
      Execute the following:

      scala> import org.apache.spark.sql.{Row, SparkSession}
      import org.apache.spark.sql.{Row, SparkSession}
      scala> import org.apache.spark.sql.types._
      import org.apache.spark.sql.types._
      scala> val rdd = sc.parallelize(Seq(Row(Seq(" 1969-12-31 23:59:59 B ").toDF("time").select(to_timestamp(col("time")).as("to_timestamp")).first().getAs[java.sql.Timestamp](0))))
      rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = ParallelCollectionRDD[721] at parallelize at <console>:28
      scala> val schema = new StructType().add(StructField("c1", TimestampType,  
      true))
      schema: org.apache.spark.sql.types.StructType = StructType(StructField(c1,TimestampType,true))
      scala> val df = spark.createDataFrame(rdd, schema)
      df194: org.apache.spark.sql.DataFrame = [c1: timestamp]
      scala> df.show(false)
      +----+
      |c1  |
      +----+
      |null|
      +----+
      

      Expected behavior

      We expect both spark-sql & spark-shell interfaces to throw an exception for an invalid DATE/TIMESTAMP, like what they do for most of the other data types (e.g. invalid value "foo" for INT data type).

      Attachments

        Activity

          People

            Unassigned Unassigned
            x/sys xsys
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: