Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-18319

`binary_replace_slice` should not work with `string` types

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • C++
    • None

    Description

      `binary_replace_slice` can give in invalid output when used with string types. Given that there is `utf8_replace_slice`, I think `binary_replace_slice` should not support string types.

      If a user actually wants to play with bytes for string type, they should explicitly cast to binary type and use `binary_replace_slice`.

      >>> pc.binary_replace_slice(["hé"], 1, 2, "x")
      <pyarrow.lib.StringArray object at 0x7fdbc09937c0>
      [
        "hx�"
      ]
      >>> pc.binary_replace_slice(["hé"], 1, 2, "x").validate(full=True)
      Traceback (most recent call last):
        ...
      ArrowInvalid: Invalid UTF8 sequence at string index 0 

      Ref: https://github.com/apache/arrow/pull/14550#discussion_r1021545816

       

      cc: apitrou 

      Attachments

        Activity

          People

            Unassigned Unassigned
            kshitij12345 Kshiteej K
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: