Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21187 Complete support for remaining Spark data types in Arrow Converters
  3. SPARK-21375

Add date and timestamp support to ArrowConverters for toPandas() collection

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.3.0
    • 2.3.0
    • PySpark, SQL
    • None

    Description

      Date and timestamp are not yet supported in DataFrame.toPandas() using ArrowConverters. These are common types for data analysis used in both Spark and Pandas and should be supported.

      There is a discrepancy with the way that PySpark and Arrow store timestamps, without timezone specified, internally. PySpark takes a UTC timestamp that is adjusted to local time and Arrow is in UTC time. Hopefully there is a clean way to resolve this.

      Spark internal storage spec:

      • DateType stored as days
      • Timestamp stored as microseconds

      Attachments

        Issue Links

          Activity

            People

              bryanc Bryan Cutler
              bryanc Bryan Cutler
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: