[SPARK-25147] GroupedData.apply pandas_udf crashing - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Cannot Reproduce
Affects Version/s: 2.3.1
Fix Version/s: None
Component/s: PySpark
Labels:
None
Environment:

OS: Mac OS 10.13.6

Python: 2.7.15, 3.6.6

PyArrow: 0.10.0

Pandas: 0.23.4

Numpy: 1.15.0

Description

Running the following example taken straight from the docs results in org.apache.spark.SparkException: Python worker exited unexpectedly (crashed) for reasons that aren't clear from any logs I can see:

from pyspark.sql import SparkSession
from pyspark.sql import functions as F


spark = (
    SparkSession
    .builder
    .appName("pandas_udf")
    .getOrCreate()
)

df = spark.createDataFrame(
    [(1, 1.0), (1, 2.0), (2, 3.0), (2, 5.0), (2, 10.0)],
    ("id", "v")
)

@F.pandas_udf("id long, v double", F.PandasUDFType.GROUPED_MAP)
def normalize(pdf):
    v = pdf.v
    return pdf.assign(v=(v - v.mean()) / v.std())

(
    df
    .groupby("id")
    .apply(normalize)
    .show()
)

See output.log for stacktrace.

Attachments

Issue Links

is duplicated by

SPARK-26611 GROUPED_MAP pandas_udf crashing "Python worker exited unexpectedly"

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Mike Sukmanowsky

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 17/Aug/18 17:04

Updated:: 12/Dec/22 18:11

Resolved:: 26/Feb/19 21:57