Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-12293

[Rust][DataFusion] Word Count

    XMLWordPrintableJSON

Details

    • Wish
    • Status: Closed
    • Trivial
    • Resolution: Invalid
    • None
    • None
    • Rust - DataFusion

    Description

      I am learning DataFusion and tried to do the canonical big data version of hello world, word count, using DataFusion.  I have been unsuccessful, and I am wondering if word count is even currently possible with DataFusion.

       

      Typically word count involves a flat_map where you split each string based on the white space contained within each string.  

       

      There are two issues I am running into

      1) creating a udf that goes from &str -> Vec<&str>.  I cannot find an `arrow::array` that maps to a collection of string, which is preventing me from creating a udf that can perform the split.

      2) Assuming I could get `1` to work, I am not aware of a method that is similar to flat_map that may be performed on a column.  In sql, I believe this is called `explode`, which I can't find in the codebase, which makes me think flat_map style operations aren't possible.

       

      My questions are:

      Is word count currently possible in DataFusion?  If so, how can perform the split and how can you perform a flat_map?  If word count cannot be done, what would need to be implemented to make it possible?

      Attachments

        Activity

          People

            Unassigned Unassigned
            jacobBaumbach Jacob Baumbach
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: