Details
-
New Feature
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
Assume you have a list array:
arr = pa.array([["a", "b"], ["a", "c"], ["b", "c", "d"]])
And you want to know for each list if it contains a certain value (of the same type as the list's values). A "list_contains" function (or other name) would be useful for that:
pc.list_contains(arr, "a")
# -> True, True False
The current workaround that I found was flattening, checking equality, and then reducing again with groupby, but this is quite tedious:
>>> temp = pa.table({'index': pc.list_parent_indices(arr), 'contains_value': pc.equal(pc.list_flatten(arr), "a")}) >>> temp.group_by('index').aggregate([('contains_value', 'any')])['contains_value_any'].chunk(0) <pyarrow.lib.BooleanArray object at 0x7ffaf3f8de20> [ true, true, false ]
But this also only works if there are no empty or missing list values.
Attachments
Issue Links
- duplicates
-
ARROW-16702 [C++] Add compute functions for list array containment
- Open