Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-10641

[C++] A "replace" or "map" kernel to replace values in array based on mapping

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • C++

    Description

      A "replace" or "map" kernel to replace values in array based on mapping. This would be similar as the pandas Series.replace (or Series.map) kernel, and as a small illustration of what is meant:

      In [41]: s = pd.Series(["Yes", "Y", "No", "N"])
      
      In [42]: s
      Out[42]: 
      0    Yes
      1      Y
      2     No
      3      N
      dtype: object
      
      In [43]: s.replace({"Y": "Yes", "N": "No"})
      Out[43]: 
      0    Yes
      1    Yes
      2     No
      3     No
      dtype: object
      
      

      Note: in pandas the difference between "replace" and "map" is that replace will only replace a value if it is present in the mapping, while map will replace every value in the input array with the corresponding value in the mapping and return null if not present in the mapping. This different behaviour could maybe be triggered with a keyword.

      Note, this is different from ARROW-10306 which is about string replacement within array elements (replacing a substring in each string element in the array), while here it is about replacing full elements of the array)

      cc maartenbreddels

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jorisvandenbossche Joris Van den Bossche
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: