Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-18097

[C++] Add a "list_contains" kernel

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • C++

    Description

      Assume you have a list array:

      arr = pa.array([["a", "b"], ["a", "c"], ["b", "c", "d"]])
      

      And you want to know for each list if it contains a certain value (of the same type as the list's values). A "list_contains" function (or other name) would be useful for that:

      pc.list_contains(arr, "a")
      # -> True, True False
      

      The current workaround that I found was flattening, checking equality, and then reducing again with groupby, but this is quite tedious:

      >>> temp = pa.table({'index': pc.list_parent_indices(arr), 'contains_value': pc.equal(pc.list_flatten(arr), "a")})
      >>> temp.group_by('index').aggregate([('contains_value', 'any')])['contains_value_any'].chunk(0)
      <pyarrow.lib.BooleanArray object at 0x7ffaf3f8de20>
      [
        true,
        true,
        false
      ]
      

      But this also only works if there are no empty or missing list values.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jorisvandenbossche Joris Van den Bossche
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated: