Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-46837 String function support (parent)
  3. SPARK-47863

endsWith and startsWith don't work correctly for some collations

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 4.0.0
    • 4.0.0
    • SQL

    Description

      CollationSupport.EndsWIth and CollationSupport.StartsWith use CollationAwareUTF8String.matchAt, which operates byte offsets to compare prefixes/suffixes. This is not correct, since sometimes string parts (suffix/prefix) of different lengths are actually equal in context of case-insensitive and lower-case collations.

      Example test cases that highlight the problem:

      - assertContains("The İo", "i̇o", "UNICODE_CI", true); for CollationSupportSuite.testContains. 
      - assertEndsWith("The İo", "i̇o", "UNICODE_CI", true); for CollationSupportSuite.testEndsWith.

      The first passes, since it uses StringSearch directly, the second one does not.

      Attachments

        Issue Links

          Activity

            People

              vladimir.golubev Vladimir Golubev
              vladimir.golubev Vladimir Golubev
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: