Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-18433

[C++][Python] Optimize aggregate functions to work with batches.

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 10.0.1
    • None
    • C++, Python
    • None

    Description

      Most compute functions work with the dataset api and don't load columns. But aggregate functions which are associative could also work: `min`, `max`, `any`, `all`, `sum`, `product`. Even `unique` and `value_counts`.

      A couple of implementation ideas:

      • expand the dataset api to support expressions which return scalars
      • add a `BatchedArray` type which is like a `ChunkedArray` but with lazy loading

      Attachments

        Activity

          People

            Unassigned Unassigned
            coady A. Coady
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: