Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-9308

tokenizer supports preserving delimiters

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • modules/analysis
    • None
    • New, Patch Available

    Description

      currently there s no way to preserve the delimiter in tokenizer, because the basic tokenizer like CharTokenizer ignore them.

      this s to make the basic tokenizer more customizable 

      e.g. "mac_book_pro" -> [mac_, book_, pro]

      Attachments

        Activity

          People

            Unassigned Unassigned
            yinlin yin Lin
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: