[SPARK-22221] Add User Documentation for Working with Arrow in Spark - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.3.0
Fix Version/s: 2.3.0
Component/s: PySpark, SQL
Labels:
None

Target Version/s:

2.3.0

Description

There needs to be user facing documentation that will show how to enable/use Arrow with Spark, what the user should expect, and describe any differences with similar existing functionality.

A comment from Xiao Li on https://github.com/apache/spark/pull/18664

Given the users/applications contain the Timestamp in their Dataset and their processing algorithms also need to have the codes based on the corresponding time-zone related assumptions.

For the new users/applications, they first enabled Arrow and later hit an Arrow bug? Can they simply turn off spark.sql.execution.arrow.enable? If not, what should they do?

For the existing users/applications, they want to utilize Arrow for better performance. Can they just turn on spark.sql.execution.arrow.enable? What should they do?

Note Hopefully, the guides/solutions are user-friendly. That means, it must be very simple to understand for most users.

Attachments

Issue Links

links to

[Github] Pull Request #19575 (BryanCutler)

[Github] Pull Request #20423 (BryanCutler)

Activity

People

Assignee:: Bryan Cutler

Reporter:: Bryan Cutler

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 06/Oct/17 22:00

Updated:: 29/Jan/18 18:36

Resolved:: 29/Jan/18 18:27