Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Incomplete
-
2.3.1
-
None
-
Spark 2.3.1
Python 3.6.5 | packaged by conda-forge | (default, Apr 6 2018, 13:39:56)
[GCC 4.8.2 20140120 (Red Hat 4.8.2-15)]openjdk version "1.8.0_181"
OpenJDK Runtime Environment (build 1.8.0_181-b13)
OpenJDK 64-Bit Server VM (build 25.181-b13, mixed mode)Centos 7 3.10.0-862.11.6.el7.x86_64 #1 SMP Tue Aug 14 21:49:04 UTC 2018 x86_64 x86_64 GNU/Linux
Spark 2.3.1 Python 3.6.5 | packaged by conda-forge | (default, Apr 6 2018, 13:39:56) [GCC 4.8.2 20140120 (Red Hat 4.8.2-15)] openjdk version "1.8.0_181" OpenJDK Runtime Environment (build 1.8.0_181-b13) OpenJDK 64-Bit Server VM (build 25.181-b13, mixed mode) Centos 7 3.10.0-862.11.6.el7.x86_64 #1 SMP Tue Aug 14 21:49:04 UTC 2018 x86_64 x86_64 GNU/Linux
Description
When Dataframes contains datetime.date or datetime.datetime instances and toJSON() is called on the Dataframe, the day is incremented in the JSON date representation.
# Create a Dataframe containing datetime.date instances, convert to JSON and display rows = [Row(cx=1, cy=2, dates=[datetime.date.fromordinal(1), datetime.date.fromordinal(2)])] df = sqc.createDataFrame(rows) df.collect() [Row(cx=1, cy=2, dates=[datetime.date(1, 1, 1), datetime.date(1, 1, 2)])] df.toJSON().collect() ['{"cx":1,"cy":2,"dates":["0001-01-03","0001-01-04"]}'] # Issue also occurs with datetime.datetime instances rows = [Row(cx=1, cy=2, dates=[datetime.datetime.fromordinal(1), datetime.datetime.fromordinal(2)])] df = sqc.createDataFrame(rows) df.collect() [Row(cx=1, cy=2, dates=[datetime.datetime(1, 1, 1, 0, 0, fold=1), datetime.datetime(1, 1, 2, 0, 0)])] df.toJSON().collect() ['{"cx":1,"cy":2,"dates":["0001-01-02T23:50:36.000-06:00","0001-01-03T23:50:36.000-06:00"]}']