Step 1 of 4: Choose Issues

Cancel

T Patch Info Key Summary Assignee Reporter P Status Resolution Created Updated Due Development
Sub-task SPARK-28132

SPARK-22216 Update document type conversion for Pandas UDFs (pyarrow 0.13.0, pandas 0.24.2, Python 3.7)

Hyukjin Kwon Hyukjin Kwon Minor Resolved Fixed  
Sub-task SPARK-25798

SPARK-22216 Internally document type conversion between Pandas data and SQL types in Pandas UDFs

Hyukjin Kwon Hyukjin Kwon Minor Resolved Fixed  
Sub-task SPARK-25640

SPARK-22216 Clarify/Improve EvalType for grouped aggregate and window aggregate

Unassigned Li Jin Major Resolved Incomplete  
Sub-task SPARK-25601

SPARK-22216 Register Grouped aggregate UDF Vectorized UDFs for SQL Statement

Hyukjin Kwon Hyukjin Kwon Major Resolved Fixed  
Sub-task SPARK-25328

SPARK-22216 Add an example for having two columns as the grouping key in group aggregate pandas UDF

Hyukjin Kwon Xiao Li Major Resolved Fixed  
Sub-task SPARK-25274

SPARK-22216 Improve toPandas with Arrow by sending out-of-order record batches

Bryan Cutler Bryan Cutler Major Resolved Fixed  
Sub-task SPARK-25272

SPARK-22216 Show some kind of test output to indicate pyarrow tests were run

Bryan Cutler Bryan Cutler Major Resolved Won't Fix  
Sub-task SPARK-24976

SPARK-22216 Allow None for Decimal type conversion (specific to PyArrow 0.9.0)

Hyukjin Kwon Hyukjin Kwon Major Resolved Fixed  
Sub-task SPARK-24796

SPARK-22216 Support GROUPED_AGG_PANDAS_UDF in Pivot

Unassigned Xiao Li Major Resolved Incomplete  
Sub-task SPARK-24624

SPARK-22216 Can not mix vectorized and non-vectorized UDFs

Li Jin Xiao Li Major Resolved Fixed  
Sub-task SPARK-24561

SPARK-22216 User-defined window functions with pandas udf (bounded window)

Li Jin Li Jin Major Resolved Fixed  
Sub-task SPARK-24334

SPARK-22216 Race condition in ArrowPythonRunner causes unclean shutdown of Arrow memory allocator

Li Jin Li Jin Major Resolved Fixed  
Sub-task SPARK-24324

SPARK-22216 Pandas Grouped Map UserDefinedFunction mixes column labels

Bryan Cutler Cristian Consonni Major Resolved Fixed  
Sub-task SPARK-23800

SPARK-22216 Support partial function and callable object with pandas UDF

Unassigned Li Jin Minor Resolved Incomplete  
Sub-task SPARK-23633

SPARK-22216 Update Pandas UDFs section in sql-programming-guide

Li Jin Li Jin Major Resolved Fixed  
Sub-task SPARK-23446

SPARK-22216 Explicitly check supported types in toPandas

Hyukjin Kwon Hyukjin Kwon Major Resolved Fixed  
Sub-task SPARK-23401

SPARK-22216 Improve test cases for all supported types and unsupported types

Aleksandr Koriagin Hyukjin Kwon Minor Resolved Fixed  
Sub-task SPARK-23380

SPARK-22216 Adds a conf for Arrow fallback in toPandas/createDataFrame with Pandas DataFrame

Hyukjin Kwon Hyukjin Kwon Major Resolved Fixed  
Sub-task SPARK-23352

SPARK-22216 Explicitly specify supported types in Pandas UDFs

Hyukjin Kwon Hyukjin Kwon Major Resolved Fixed  
Sub-task SPARK-23334

SPARK-22216 Fix pandas_udf with return type StringType() to handle str type properly in Python 2.

Takuya Ueshin Takuya Ueshin Blocker Resolved Fixed  
Sub-task SPARK-23314

SPARK-22216 Pandas grouped udf on dataset with timestamp column error

Li Jin Felix Cheung Major Resolved Fixed  
Sub-task SPARK-23302

SPARK-22216 Refactor group aggregate pandas UDF to its own catalyst rules

Unassigned Li Jin Major Resolved Incomplete  
Sub-task SPARK-23261

SPARK-22216 Rename Pandas UDFs

Xiao Li Xiao Li Major Resolved Fixed  
Sub-task SPARK-23047

SPARK-22216 Change MapVector to NullableMapVector in ArrowColumnVector

Li Jin Li Jin Major Resolved Fixed  
Sub-task SPARK-23030

SPARK-22216 Decrease memory consumption with toPandas() collection using Arrow

Bryan Cutler Bryan Cutler Major Resolved Fixed  
Sub-task SPARK-23011

SPARK-22216 Support alternative function form with group aggregate pandas UDF

Li Jin Li Jin Major Resolved Fixed  
Sub-task SPARK-22980

SPARK-22216 Using pandas_udf when inputs are not Pandas's Series or DataFrame

Hyukjin Kwon Xiao Li Major Resolved Fixed  
Sub-task SPARK-22978

SPARK-22216 Register Scalar Vectorized UDFs for SQL Statement

Xiao Li Xiao Li Major Resolved Fixed  
Sub-task SPARK-22930

SPARK-22216 Improve the description of Vectorized UDFs for non-deterministic cases

Li Jin Xiao Li Major Resolved Fixed  
Sub-task SPARK-22409

SPARK-22216 Add function type argument to pandas_udf

Li Jin Li Jin Major Resolved Fixed  
Sub-task SPARK-22324

SPARK-22216 Upgrade Arrow to version 0.8.0 and upgrade Netty to 4.1.17

Bryan Cutler Bryan Cutler Major Resolved Fixed  
Sub-task SPARK-22323

SPARK-22216 Design doc for different types of pandas_udf

Unassigned Li Jin Major Resolved Fixed  
Sub-task SPARK-22274

SPARK-22216 User-defined aggregation functions with pandas udf

Li Jin Li Jin Major Resolved Fixed  
Sub-task SPARK-22239

SPARK-22216 User-defined window functions with pandas udf (unbounded window)

Li Jin Li Jin Major Resolved Fixed  
Sub-task SPARK-21404

SPARK-22216 Simple Vectorized Python UDFs using Arrow

Unassigned Bryan Cutler Major Closed Fixed  
Sub-task SPARK-21190

SPARK-22216 SPIP: Vectorized UDFs in Python

Bryan Cutler Reynold Xin Major Resolved Fixed  
Sub-task SPARK-20791

SPARK-22216 Use Apache Arrow to Improve Spark createDataFrame from Pandas.DataFrame

Bryan Cutler Bryan Cutler Major Resolved Fixed  
Sub-task SPARK-20396

SPARK-22216 groupBy().apply() with pandas udf in pyspark

Li Jin Li Jin Major Resolved Fixed  

Cancel