Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-31211

Failure on loading 1000-02-29 from parquet saved by Spark 2.4.5

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0
    • 3.0.0
    • SQL
    • None

    Description

      Save valid date in Julian calendar by Spark 2.4.5 in a leap year, for instance 1000-02-29:

      $ export TZ="America/Los_Angeles"
      
      scala> spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles")
      scala> df.write.mode("overwrite").format("avro").save("/Users/maxim/tmp/before_1582/2_4_5_date_avro_leap")
      
      scala> val df = Seq(java.sql.Date.valueOf("1000-02-29")).toDF("dateS").select($"dateS".as("date"))
      df: org.apache.spark.sql.DataFrame = [date: date]
      
      scala> df.show
      +----------+
      |      date|
      +----------+
      |1000-02-29|
      +----------+
      
      scala> df.write.mode("overwrite").parquet("/Users/maxim/tmp/before_1582/2_4_5_date_leap")
      

      Load the parquet files back by Spark 3.1.0-SNAPSHOT:

      Welcome to
            ____              __
           / __/__  ___ _____/ /__
          _\ \/ _ \/ _ `/ __/  '_/
         /___/ .__/\_,_/_/ /_/\_\   version 3.1.0-SNAPSHOT
            /_/
      
      Using Scala version 2.12.10 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_231)
      Type in expressions to have them evaluated.
      Type :help for more information.
      
      scala> spark.read.parquet("/Users/maxim/tmp/before_1582/2_4_5_date_leap").show
      +----------+
      |      date|
      +----------+
      |1000-03-06|
      +----------+
      
      
      scala> spark.conf.set("spark.sql.legacy.parquet.rebaseDateTime.enabled", true)
      
      scala> spark.read.parquet("/Users/maxim/tmp/before_1582/2_4_5_date_leap").show
      20/03/21 03:03:59 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 3)
      java.time.DateTimeException: Invalid date 'February 29' as '1000' is not a leap year
      	at java.time.LocalDate.create(LocalDate.java:429)
      	at java.time.LocalDate.of(LocalDate.java:269)
      	at org.apache.spark.sql.catalyst.util.DateTimeUtils$.rebaseJulianToGregorianDays(DateTimeUtils.scala:1008)
      

      Attachments

        Issue Links

          Activity

            People

              maxgekk Max Gekk
              maxgekk Max Gekk
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: