Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-12374

Explore optimizing re2 usage for leading / trailing ".*" when generating LIKE regex

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • Impala 4.3.0
    • None
    • Backend
    • ghx-label-13

    Description

      Abseil has some recommendations about efficiently using re2 here: https://abseil.io/fast/21

      One recommendation it has is to avoid leading / trailing .* for FullMatch():

      Using RE2::FullMatch() with leading or trailing .* is an antipattern. Instead, change it to RE2::PartialMatch() and remove the .*. RE2::PartialMatch() performs an unanchored search, so it is also necessary to anchor the regular expression (i.e. with ^ or $) to indicate that it must match at the start or end of the string.

      For our slow path LIKE evaluation, we convert the LIKE to a regular expression and use FullMatch(). Our code to generate the regular expression will use leading/trailing .* and FullMatch for patterns like '%a%b%'. We could try detecting these cases and switching to PartialMatch with anchors. See the link for more details about how this works.

      Attachments

        Activity

          People

            Unassigned Unassigned
            joemcdonnell Joe McDonnell
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: