Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-40325

Support of Columnar result(ColumnarBatch) in org.apache.spark.sql.Dataset flatMap, transform, etc

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.3.0
    • None
    • Java API, Spark Core
    • None

    Description

      Sometimes result of data transformation in JVM program available from native code in Apache Arrow columnar data format. Current Dataset API require unnecessary data transform from columnar format wrapper into row with additional allocation on JVM heap. 

      In this proposed feature I ask for propagation of columnar data in DatasetAPI without unnecessary InternalRow->Row->InternalRow conversion.

       

      Current solution use ColumnarBatch wrapper on top of ArrowColumnVector and rowExpressionEncoder.createDeserializer() to transform data into Row

       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              igor.suhorukov Igor Suhorukov
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: