Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Invalid
-
None
-
None
Description
Use case:
While writing tests (both in IOx and in DataFusion) where I need a single `RecordBatch`, I often find myself doing something like this:
```
let schema = Arc::new(Schema::new(vec![
ArrowField::new("float_field", ArrowDataType::Float64, true),
ArrowField::new("time", ArrowDataType::Int64, true),
]));
let float_array: ArrayRef = Arc::new(Float64Array::from(vec![10.1, 20.1, 30.1, 40.1]));
let timestamp_array: ArrayRef = Arc::new(Int64Array::from(vec![1000, 2000, 3000, 4000]));
let batch = RecordBatch::try_new(schema, vec![float_array, timestamp_array])
.expect("created new record batch");
```
This is annoying because the information that `float_field` is a float is encoded both in the Schema and the `Float64Array`
I would much rather rather be able to construct RecordBatches a a builder style to avoid the the redundancy and reduce the amount of typing / redundancy:
```
let float_array: ArrayRef = Arc::new(Float64Array::from(vec![10.1, 20.1, 30.1, 40.1]));
let timestamp_array: ArrayRef = Arc::new(Int64Array::from(vec![1000, 2000, 3000, 4000]));
let batch = RecordBatch::empty()
.append("float_field", timestamp_array).unwrap()
.append("time", float_array).unwrap;
```
The proposal is to add a method to `RecordBatch` like
```
impl RecordBatch
```
That would append the a field name to the current schema, returning an error if field_name was already present.
The nullability of the field would be set based on the actual null count of the field_values