[SPARK-31108] Parameter cannot be passed to pandas udf of type map_iter - ASF JIRA

XML

Word

Printable

JSON

Parameters can only be passed in the following way:

********************************************************************

from pyspark.sql.functions import pandas_udf, PandasUDFType

def map_iter_pandas_udf_example(spark):
strr = "abcd
df = spark.createDataFrame([(1, 21),(2,30)],("id", "age"))

@pandas_udf(df.schema, PandasUDFType.MAP_ITER)
def filter_func(batch_iter, x = strr):
print
for pdf in batch_iter:
yield pdf[pdf.id == 1]

df.mapInPandas(filter_func).show()

*******************************************************************

However, if the code edited as follow, error ccurred:

*******************************************************************

from pyspark.sql.functions import pandas_udf, PandasUDFType

def map_iter_pandas_udf_example(spark):
strr = "abcd
df = spark.createDataFrame([(1, 21),(2,30)],("id", "age"))

@pandas_udf(df.schema, PandasUDFType.MAP_ITER)
def filter_func(batch_iter, x = strr):
print
for pdf in batch_iter:
yield pdf[pdf.id == 1]

data = "dbca"

df.mapInPandas(filter_func(data)).show()

*******************************************************************

ValueError: Invalid udf: the udf argument must be a pandas_udf of type MAP_ITER.

Does anyone know if pandas udf of type map_iter can pass parameters, and if so, how to write the code? Thanks.