Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-12411

[Rust] Add Builder interface for adding Arrays to record batches

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Invalid
    • None
    • None
    • Rust

    Description

      Use case:

      While writing tests (both in IOx and in DataFusion) where I need a single `RecordBatch`, I often find myself doing something like this:

      ```
      let schema = Arc::new(Schema::new(vec![
      ArrowField::new("float_field", ArrowDataType::Float64, true),
      ArrowField::new("time", ArrowDataType::Int64, true),
      ]));

      let float_array: ArrayRef = Arc::new(Float64Array::from(vec![10.1, 20.1, 30.1, 40.1]));
      let timestamp_array: ArrayRef = Arc::new(Int64Array::from(vec![1000, 2000, 3000, 4000]));

      let batch = RecordBatch::try_new(schema, vec![float_array, timestamp_array])
      .expect("created new record batch");
      ```

      This is annoying because the information that `float_field` is a float is encoded both in the Schema and the `Float64Array`

      I would much rather rather be able to construct RecordBatches a a builder style to avoid the the redundancy and reduce the amount of typing / redundancy:

      ```

      let float_array: ArrayRef = Arc::new(Float64Array::from(vec![10.1, 20.1, 30.1, 40.1]));
      let timestamp_array: ArrayRef = Arc::new(Int64Array::from(vec![1000, 2000, 3000, 4000]));

      let batch = RecordBatch::empty()
      .append("float_field", timestamp_array).unwrap()
      .append("time", float_array).unwrap;

      ```

      The proposal is to add a method to `RecordBatch` like

      ```
      impl RecordBatch

      { ... fn append(self, field_name: &str, field_values: ArrayRef) -> Result<Self> }

      ```

      That would append the a field name to the current schema, returning an error if field_name was already present.

      The nullability of the field would be set based on the actual null count of the field_values

      Attachments

        Activity

          People

            alamb Andrew Lamb
            alamb Andrew Lamb
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 3h 40m
                3h 40m