Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-30640

Prevent unnessary copies of data in Arrow to Pandas conversion with Timestamps

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.4.4
    • 3.0.0
    • PySpark, SQL
    • None

    Description

      During conversion of Arrow to Pandas, timestamp columns are modified to localize for the current timezone. If there are no timestamp columns, this can sometimes result in unnecessary copies of the data. See https://www.mail-archive.com/dev@arrow.apache.org/msg17008.html for discussion.

      Attachments

        Issue Links

          Activity

            People

              bryanc Bryan Cutler
              bryanc Bryan Cutler
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: