[SPARK-18658] Writing to a text DataSource buffers one or more lines in memory - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.2.0
Component/s: SQL
Labels:
None

Target Version/s:

2.2.0

Description

The JSON and CSV writing paths buffer entire lines (or multiple lines) in memory prior to writing to disk. For large rows this is inefficient. It may make sense to skip the TextOutputFormat record writer and go directly to the underlying FSDataOutputStream, allowing the writers to append arbitrary byte arrays (fractions of a row) instead of a full row.

Attachments

Issue Links

is duplicated by

SPARK-18984 Concat with ds.write.text() throw exception if column contains null data

Resolved

links to

[Github] Pull Request #16089 (NathanHowell)

Activity

People

Assignee:: Nathan Howell

Reporter:: Nathan Howell

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 30/Nov/16 21:23

Updated:: 23/Dec/16 11:32

Resolved:: 02/Dec/16 05:40