Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25467

Python date/datetime objects in dataframes increment by 1 day when converted to JSON

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 2.3.1
    • None
    • PySpark, SQL

    Description

      When Dataframes contains datetime.date or datetime.datetime instances and toJSON() is called on the Dataframe, the day is incremented in the JSON date representation.

      # Create a Dataframe containing datetime.date instances, convert to JSON and display
      rows = [Row(cx=1, cy=2, dates=[datetime.date.fromordinal(1), datetime.date.fromordinal(2)])]
      
      df = sqc.createDataFrame(rows)
      
      df.collect()
      [Row(cx=1, cy=2, dates=[datetime.date(1, 1, 1), datetime.date(1, 1, 2)])]
      
      df.toJSON().collect()
      ['{"cx":1,"cy":2,"dates":["0001-01-03","0001-01-04"]}']
      
      
      # Issue also occurs with datetime.datetime instances
      
      rows = [Row(cx=1, cy=2, dates=[datetime.datetime.fromordinal(1), datetime.datetime.fromordinal(2)])]
      
      df = sqc.createDataFrame(rows)
      
      df.collect()
      [Row(cx=1, cy=2, dates=[datetime.datetime(1, 1, 1, 0, 0, fold=1), datetime.datetime(1, 1, 2, 0, 0)])]
      
      df.toJSON().collect()
      ['{"cx":1,"cy":2,"dates":["0001-01-02T23:50:36.000-06:00","0001-01-03T23:50:36.000-06:00"]}']
      
      

       

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            davidvhill David V. Hill
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: