Uploaded image for project: 'Stanbol (Retired)'
  1. Stanbol (Retired)
  2. STANBOL-538

Improve extraction of Keywords (alpha numeric IDs, URNs ...) with the KeywordLinkingEngine

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.9.0-incubating
    • None
    • None

    Description

      Currently the KeywordEngine can not be used to match against alpha numeric IDs as often used for products. This is because the Tokenizers used by OpenNLP tend to split such IDs in several small tokens what prevents a correct mapping against such kind of IDs.

      The simplest solution is to implement a simple Tokenizer that is optimized for the use to extract Keywords. Such an Tokenizer should only split based on white spaces.

      Attachments

        Activity

          People

            rwesten Rupert Westenthaler
            rwesten Rupert Westenthaler
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: