[SPARK-31287] groupby().applyInPandas, groupby().cogroup().applyInPandas and mapInPandas should ignore type hints - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.1.0
Fix Version/s: 3.0.0
Component/s: PySpark, SQL
Labels:
None

Description

Setting type hints in pandas function API should not matter at this moment. However, currently it tries to infer type hint when it's set.

import pandas as pd

def pandas_plus_one(v: pd.DataFrame) -> pd.DataFrame:
    return v + 1

spark.range(10).groupby('id').applyInPandas(pandas_plus_one, schema="id long").show()

Traceback (most recent call last):
  File "/.../spark/python/pyspark/sql/utils.py", line 98, in deco
    return f(*a, **kw)
  File "/.../spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o34.flatMapGroupsInPandas.
: java.lang.IllegalArgumentException: requirement failed: Must pass a grouped map udf
	at scala.Predef$.require(Predef.scala:281)
	at org.apache.spark.sql.RelationalGroupedDataset.flatMapGroupsInPandas(RelationalGroupedDataset.scala:541)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/.../spark/python/pyspark/sql/pandas/group_ops.py", line 182, in applyInPandas
    jdf = self._jgd.flatMapGroupsInPandas(udf_column._jc.expr())
  File "/.../spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__
  File "/.../spark/python/pyspark/sql/utils.py", line 102, in deco
    raise converted
pyspark.sql.utils.IllegalArgumentException: requirement failed: Must pass a grouped map udf

Looks groupby().cogroup().applyInPandas and mapInPandas also have the same issues.

Attachments

Issue Links

links to

GitHub Pull Request #28052

Activity

People

Assignee:: Hyukjin Kwon

Reporter:: Hyukjin Kwon

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 27/Mar/20 11:54

Updated:: 12/Dec/22 18:11

Resolved:: 29/Mar/20 05:00