Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Invalid
-
3.2.1
-
None
-
None
Description
Describe the bug
When we construct a DataFrame with an invalid DATE/TIMESTAMP (e.g. 1969-12-31 23:59:59 B) via spark-shell, or insert an invalid DATE/TIMESTAMP into a table via spark-sql, both interfaces unexpectedly evaluate the invalid value to NULL, instead of throwing an exception.
To Reproduce
On Spark 3.2.1 (commit 4f25b3f712), using spark-sql:
$SPARK_HOME/bin/spark-sql
Execute the following:
spark-sql> create table timestamp_vals(c1 TIMESTAMP) stored as ORC; spark-sql> insert into timestamp_vals select cast(" 1969-12-31 23:59:59 B "as timestamp); spark-sql> select * from timestamp_vals; NULL
Using spark-shell:
$SPARK_HOME/bin/spark-shell
Execute the following:
scala> import org.apache.spark.sql.{Row, SparkSession} import org.apache.spark.sql.{Row, SparkSession} scala> import org.apache.spark.sql.types._ import org.apache.spark.sql.types._ scala> val rdd = sc.parallelize(Seq(Row(Seq(" 1969-12-31 23:59:59 B ").toDF("time").select(to_timestamp(col("time")).as("to_timestamp")).first().getAs[java.sql.Timestamp](0)))) rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = ParallelCollectionRDD[721] at parallelize at <console>:28 scala> val schema = new StructType().add(StructField("c1", TimestampType, true)) schema: org.apache.spark.sql.types.StructType = StructType(StructField(c1,TimestampType,true)) scala> val df = spark.createDataFrame(rdd, schema) df194: org.apache.spark.sql.DataFrame = [c1: timestamp] scala> df.show(false) +----+ |c1 | +----+ |null| +----+
Expected behavior
We expect both spark-sql & spark-shell interfaces to throw an exception for an invalid DATE/TIMESTAMP, like what they do for most of the other data types (e.g. invalid value "foo" for INT data type).