Details
-
Question
-
Status: Resolved
-
Major
-
Resolution: Invalid
-
3.0.0
-
None
-
None
Description
Parameters can only be passed in the following way:
********************************************************************
from pyspark.sql.functions import pandas_udf, PandasUDFType
def map_iter_pandas_udf_example(spark):
strr = "abcd
df = spark.createDataFrame([(1, 21),(2,30)],("id", "age"))
@pandas_udf(df.schema, PandasUDFType.MAP_ITER)
def filter_func(batch_iter, x = strr):
print
for pdf in batch_iter:
yield pdf[pdf.id == 1]
df.mapInPandas(filter_func).show()
*******************************************************************
However, if the code edited as follow, error ccurred:
*******************************************************************
from pyspark.sql.functions import pandas_udf, PandasUDFType
def map_iter_pandas_udf_example(spark):
strr = "abcd
df = spark.createDataFrame([(1, 21),(2,30)],("id", "age"))
@pandas_udf(df.schema, PandasUDFType.MAP_ITER)
def filter_func(batch_iter, x = strr):
print
for pdf in batch_iter:
yield pdf[pdf.id == 1]
data = "dbca"
df.mapInPandas(filter_func(data)).show()
*******************************************************************
ValueError: Invalid udf: the udf argument must be a pandas_udf of type MAP_ITER.
Does anyone know if pandas udf of type map_iter can pass parameters, and if so, how to write the code? Thanks.