Details
-
New Feature
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.3.0
-
None
-
None
Description
Sometimes result of data transformation in JVM program available from native code in Apache Arrow columnar data format. Current Dataset API require unnecessary data transform from columnar format wrapper into row with additional allocation on JVM heap.
In this proposed feature I ask for propagation of columnar data in DatasetAPI without unnecessary InternalRow->Row->InternalRow conversion.
Current solution use ColumnarBatch wrapper on top of ArrowColumnVector and rowExpressionEncoder.createDeserializer() to transform data into Row
Attachments
Issue Links
- is related to
-
SPARK-27396 SPIP: Public APIs for extended Columnar Processing Support
- Resolved