[SPARK-40325] Support of Columnar result(ColumnarBatch) in org.apache.spark.sql.Dataset flatMap, transform, etc - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 3.3.0
Fix Version/s: None
Component/s: Java API, Spark Core
Labels:
None

Description

Sometimes result of data transformation in JVM program available from native code in Apache Arrow columnar data format. Current Dataset API require unnecessary data transform from columnar format wrapper into row with additional allocation on JVM heap.

In this proposed feature I ask for propagation of columnar data in DatasetAPI without unnecessary InternalRow->Row->InternalRow conversion.

Current solution use ColumnarBatch wrapper on top of ArrowColumnVector and rowExpressionEncoder.createDeserializer() to transform data into Row

Attachments

Issue Links

is related to

SPARK-27396 SPIP: Public APIs for extended Columnar Processing Support

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Igor Suhorukov

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 04/Sep/22 11:48

Updated:: 04/Sep/22 11:50