Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
Currently the KeywordEngine can not be used to match against alpha numeric IDs as often used for products. This is because the Tokenizers used by OpenNLP tend to split such IDs in several small tokens what prevents a correct mapping against such kind of IDs.
The simplest solution is to implement a simple Tokenizer that is optimized for the use to extract Keywords. Such an Tokenizer should only split based on white spaces.