Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-22216 Improving PySpark/Pandas interoperability
  3. SPARK-23314

Pandas grouped udf on dataset with timestamp column error

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.3.0
    • 2.3.0
    • PySpark
    • None

    Description

      Under  SPARK-22216

      When testing pandas_udf on group bys, I saw this error with the timestamp column.

      File "pandas/_libs/tslib.pyx", line 3593, in pandas._libs.tslib.tz_localize_to_utc

      AmbiguousTimeError: Cannot infer dst time from Timestamp('2015-11-01 01:29:30'), try using the 'ambiguous' argument

      For details, see Comment box. I'm able to reproduce this on the latest branch-2.3 (last change from Feb 1 UTC)

      Attachments

        Activity

          People

            icexelloss Li Jin
            felixcheung Felix Cheung
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: