XMLWordPrintableJSON

Details

    • Sub-task
    • Status: In Progress
    • Major
    • Resolution: Unresolved
    • 3.0.2, 3.1.1
    • None
    • PySpark, SQL
    • None

    Description

      Focus on a simpler case first: `@pandas_udf(UDT)`.

      Because Pandas UDF uses pyarrow to passing data, it does not currently support UserDefinedTypes, as what normal python udf does.

      For example:

      class BoxType(UserDefinedType):
          @classmethod
          def sqlType(cls) -> StructType:
              return StructType(
                  fields=[
                      StructField("xmin", DoubleType(), False),
                      StructField("ymin", DoubleType(), False),
                      StructField("xmax", DoubleType(), False),
                      StructField("ymax", DoubleType(), False),
                  ]
              )
      
      @pandas_udf(
           returnType=StructType([StructField("boxes", ArrayType(Box()))]
      )
      def pandas_pf(s: pd.DataFrame) -> pd.DataFrame:
             yield s
      

      The logs show

      try:
                      to_arrow_type(self._returnType_placeholder)
                  except TypeError:
      >               raise NotImplementedError(
                          "Invalid return type with scalar Pandas UDFs: %s is "
      E                   NotImplementedError: Invalid return type with scalar Pandas UDFs: StructType(List(StructField(boxes,ArrayType(Box,true),true))) is not supported
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            sadhen Darcy Shen
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: