Description
Save valid date in Julian calendar by Spark 2.4.5 in a leap year, for instance 1000-02-29:
$ export TZ="America/Los_Angeles"
scala> spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles") scala> df.write.mode("overwrite").format("avro").save("/Users/maxim/tmp/before_1582/2_4_5_date_avro_leap") scala> val df = Seq(java.sql.Date.valueOf("1000-02-29")).toDF("dateS").select($"dateS".as("date")) df: org.apache.spark.sql.DataFrame = [date: date] scala> df.show +----------+ | date| +----------+ |1000-02-29| +----------+ scala> df.write.mode("overwrite").parquet("/Users/maxim/tmp/before_1582/2_4_5_date_leap")
Load the parquet files back by Spark 3.1.0-SNAPSHOT:
Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 3.1.0-SNAPSHOT /_/ Using Scala version 2.12.10 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_231) Type in expressions to have them evaluated. Type :help for more information. scala> spark.read.parquet("/Users/maxim/tmp/before_1582/2_4_5_date_leap").show +----------+ | date| +----------+ |1000-03-06| +----------+ scala> spark.conf.set("spark.sql.legacy.parquet.rebaseDateTime.enabled", true) scala> spark.read.parquet("/Users/maxim/tmp/before_1582/2_4_5_date_leap").show 20/03/21 03:03:59 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 3) java.time.DateTimeException: Invalid date 'February 29' as '1000' is not a leap year at java.time.LocalDate.create(LocalDate.java:429) at java.time.LocalDate.of(LocalDate.java:269) at org.apache.spark.sql.catalyst.util.DateTimeUtils$.rebaseJulianToGregorianDays(DateTimeUtils.scala:1008)
Attachments
Issue Links
- links to