Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-12265

flight_data_from_arrow_batch sends too much data

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Invalid
    • 4.0.0
    • None
    • FlightRPC, Rust
    • None

    Description

      Arrow arrays can share the same backing store, even if the array is just a "view" of a slice of another array.

      Yet, when `flight_data_from_arrow_batch` encodes the arrays into a FlightData, it blindly copies the entire buffer ready to be sent over the wire.

      Thus, for example, when DataFusion uses the `arrow::compute::limit` operator to return a few elements of an array, we still end up with a the full (potentially) large array being sent over the wire.

       

      Since encoding the array in a FlightData involves copying the data anyway, perhaps it would be beneficial to take the Array length in consideration and copy only the parts of the buffer that contain actual data.

      Attachments

        Activity

          People

            Unassigned Unassigned
            ithkuil Marko Mikulicic
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: