Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-33054

Support interval type in PySpark

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.1.0
    • None
    • PySpark, SQL
    • None

    Description

      At the moment PySpark doesn't support interval types at all. For example calling the following

      spark.sql("SELECT current_date() - current_date()")                                                                                                                                  
      

      or

      from pyspark.sql.functions import current_timestamp                                                                                                                                  
      spark.range(1).select(current_timestamp() - current_timestamp())  
      

      results in

      Traceback (most recent call last):
      ...
      ValueError: Could not parse datatype: interval
      

      At minimum, we should support CalendarIntervalType in the schema, so queries using it don't fail on conversion.

      Optionally we should have to provide conversions between internal and external types. That however, might be tricky, as CalendarInterval seems to have different semantics than datetime.timedetla.

      Also see https://issues.apache.org/jira/browse/SPARK-21187?focusedCommentId=16474664&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16474664

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              zero323 Maciej Szymkiewicz
              Votes:
              2 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: