Details
Description
Focus on a simpler case first: `@pandas_udf(UDT)`.
Because Pandas UDF uses pyarrow to passing data, it does not currently support UserDefinedTypes, as what normal python udf does.
For example:
class BoxType(UserDefinedType): @classmethod def sqlType(cls) -> StructType: return StructType( fields=[ StructField("xmin", DoubleType(), False), StructField("ymin", DoubleType(), False), StructField("xmax", DoubleType(), False), StructField("ymax", DoubleType(), False), ] ) @pandas_udf( returnType=StructType([StructField("boxes", ArrayType(Box()))] ) def pandas_pf(s: pd.DataFrame) -> pd.DataFrame: yield s
The logs show
try: to_arrow_type(self._returnType_placeholder) except TypeError: > raise NotImplementedError( "Invalid return type with scalar Pandas UDFs: %s is " E NotImplementedError: Invalid return type with scalar Pandas UDFs: StructType(List(StructField(boxes,ArrayType(Box,true),true))) is not supported