Description
Setting type hints in pandas function API should not matter at this moment. However, currently it tries to infer type hint when it's set.
import pandas as pd def pandas_plus_one(v: pd.DataFrame) -> pd.DataFrame: return v + 1 spark.range(10).groupby('id').applyInPandas(pandas_plus_one, schema="id long").show()
Traceback (most recent call last): File "/.../spark/python/pyspark/sql/utils.py", line 98, in deco return f(*a, **kw) File "/.../spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 328, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o34.flatMapGroupsInPandas. : java.lang.IllegalArgumentException: requirement failed: Must pass a grouped map udf at scala.Predef$.require(Predef.scala:281) at org.apache.spark.sql.RelationalGroupedDataset.flatMapGroupsInPandas(RelationalGroupedDataset.scala:541) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/.../spark/python/pyspark/sql/pandas/group_ops.py", line 182, in applyInPandas jdf = self._jgd.flatMapGroupsInPandas(udf_column._jc.expr()) File "/.../spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__ File "/.../spark/python/pyspark/sql/utils.py", line 102, in deco raise converted pyspark.sql.utils.IllegalArgumentException: requirement failed: Must pass a grouped map udf
Looks groupby().cogroup().applyInPandas and mapInPandas also have the same issues.
Attachments
Issue Links
- links to