[SPARK-23380] Adds a conf for Arrow fallback in toPandas/createDataFrame with Pandas DataFrame - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.3.0
Fix Version/s: 2.4.0
Component/s: PySpark
Labels:
None

Description

Seems we can check the schema ahead and fall back in toPandas.

Please see this case below:

df = spark.createDataFrame([[{'a': 1}]])

spark.conf.set("spark.sql.execution.arrow.enabled", "false")
df.toPandas()
spark.conf.set("spark.sql.execution.arrow.enabled", "true")
df.toPandas()

...
py4j.protocol.Py4JJavaError: An error occurred while calling o42.collectAsArrowToPython.
...
java.lang.UnsupportedOperationException: Unsupported data type: map<string,bigint>

In case of createDataFrame, we fall back to make this at least working even though the optimisation is disabled.

df = spark.createDataFrame([[{'a': 1}]])
spark.conf.set("spark.sql.execution.arrow.enabled", "false")
pdf = df.toPandas()
spark.createDataFrame(pdf).show()
spark.conf.set("spark.sql.execution.arrow.enabled", "true")
spark.createDataFrame(pdf).show()

...
... UserWarning: Arrow will not be used in createDataFrame: Error inferring Arrow type ...
+--------+
|      _1|
+--------+
|[a -> 1]|
+--------+

We need to match the behaviours and add a configuration to control the behaviour.

Attachments

Issue Links

is related to

SPARK-23446 Explicitly check supported types in toPandas

Resolved

links to

[Github] Pull Request #20567 (HyukjinKwon)

[Github] Pull Request #20678 (HyukjinKwon)

Activity

People

Assignee:: Hyukjin Kwon

Reporter:: Hyukjin Kwon

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 10/Feb/18 09:28

Updated:: 12/Dec/22 18:10

Resolved:: 08/Mar/18 11:22