Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-22216 Improving PySpark/Pandas interoperability
  3. SPARK-25798

Internally document type conversion between Pandas data and SQL types in Pandas UDFs

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.4.0
    • 3.0.0
    • PySpark
    • None

    Description

      Currently, UDF's type coercion is not cleanly defined. See also https://github.com/apache/spark/pull/20163 and https://github.com/apache/spark/pull/22610

      This JIRA targets to describe the type conversion logic internally. For instance:

          # +----------------------+----------+-------+--------+--------------------+--------------------+--------+---------+---------+---------+---------+-----------------------------------+-----------------------------------------------------+------------+----------------------+-----------+--------------------------------+  # noqa
          # |SQL Type \ Pandas Type|True(bool)|1(int8)|1(int16)|            1(int32)|            1(int64)|1(uint8)|1(uint16)|1(uint32)|1(uint64)|a(object)|1970-01-01 00:00:00(datetime64[ns])|1970-01-01 00:00:00-05:00(datetime64[ns, US/Eastern])|1.0(float64)|[1 2 3](object(array))|A(category)|1 days 00:00:00(timedelta64[ns])|  # noqa
          # +----------------------+----------+-------+--------+--------------------+--------------------+--------+---------+---------+---------+---------+-----------------------------------+-----------------------------------------------------+------------+----------------------+-----------+--------------------------------+  # noqa
          # |               boolean|      True|   True|    True|                True|                True|    True|     True|     True|     True|        X|                              False|                                                False|       False|                     X|          X|                           False|  # noqa
          # |               tinyint|         1|      1|       1|                   1|                   1|       X|        X|        X|        X|        X|                                  X|                                                    X|           1|                     X|          0|                               X|  # noqa
          # |              smallint|         1|      1|       1|                   1|                   1|       1|        X|        X|        X|        X|                                  X|                                                    X|           1|                     X|          X|                               X|  # noqa
          # |                   int|         1|      1|       1|                   1|                   1|       1|        1|        X|        X|        X|                                  X|                                                    X|           1|                     X|          X|                               X|  # noqa
          # |                bigint|         1|      1|       1|                   1|                   1|       1|        1|        1|        X|        X|                                  0|                                       18000000000000|           1|                     X|          X|                               X|  # noqa
          # |                string|       u''|u'\x01'| u'\x01'|             u'\x01'|             u'\x01'| u'\x01'|  u'\x01'|  u'\x01'|  u'\x01'|     u'a'|                                  X|                                                    X|         u''|                     X|          X|                               X|  # noqa
          # |                  date|         X|      X|       X|datetime.date(197...|                   X|       X|        X|        X|        X|        X|               datetime.date(197...|                                                    X|           X|                     X|          X|                               X|  # noqa
          # |             timestamp|         X|      X|       X|                   X|datetime.datetime...|       X|        X|        X|        X|        X|               datetime.datetime...|                                 datetime.datetime...|           X|                     X|          X|                               X|  # noqa
          # |                 float|       1.0|    1.0|     1.0|                 1.0|                 1.0|     1.0|      1.0|      1.0|      1.0|        X|                                  X|                                                    X|         1.0|                     X|          X|                               X|  # noqa
          # |                double|       1.0|    1.0|     1.0|                 1.0|                 1.0|     1.0|      1.0|      1.0|      1.0|        X|                                  X|                                                    X|         1.0|                     X|          X|                               X|  # noqa
          # |            array<int>|         X|      X|       X|                   X|                   X|       X|        X|        X|        X|        X|                                  X|                                                    X|           X|             [1, 2, 3]|          X|                               X|  # noqa
          # |                binary|         X|      X|       X|                   X|                   X|       X|        X|        X|        X|        X|                                  X|                                                    X|           X|                     X|          X|                               X|  # noqa
          # |         decimal(10,0)|         X|      X|       X|                   X|                   X|       X|        X|        X|        X|        X|                                  X|                                                    X|           X|                     X|          X|                               X|  # noqa
          # |       map<string,int>|         X|      X|       X|                   X|                   X|       X|        X|        X|        X|        X|                                  X|                                                    X|           X|                     X|          X|                               X|  # noqa
          # |        struct<_1:int>|         X|      X|       X|                   X|                   X|       X|        X|        X|        X|        X|                                  X|                                                    X|           X|                     X|          X|                               X|  # noqa
          # +----------------------+----------+-------+--------+--------------------+--------------------+--------+---------+---------+---------+---------+-----------------------------------+-----------------------------------------------------+------------+----------------------+-----------+--------------------------------+  # noqa
      

      Attachments

        Issue Links

          Activity

            People

              gurwls223 Hyukjin Kwon
              gurwls223 Hyukjin Kwon
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: