Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-11130

[SASI Pre-QA] = semantics not respected when using StandardAnalyzer

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Normal
    • Resolution: Fixed
    • 3.4
    • Legacy/CQL
    • None
    • Tested from build CASSANDRA-11067

    • Normal

    Description

      Tested from build CASSANDRA-11067

      CREATE KEYSPACE music WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'}  AND durable_writes = true;
      
      CREATE TABLE music.albums (
          id int PRIMARY KEY,
          artist text,
          title1 text,
          title2 text
      );
      
      CREATE CUSTOM INDEX ON music.albums (title1) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'tokenization_skip_stop_words': 'true', 'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', 'case_sensitive': 'false', 'mode': 'PREFIX', 'tokenization_enable_stemming': 'true'};
      
      CREATE CUSTOM INDEX ON music.albums (title2) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'tokenization_skip_stop_words': 'true', 'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', 'case_sensitive': 'false', 'mode': 'CONTAINS', 'tokenization_enable_stemming': 'true'};
      
      INSERT INTO music.albums(id, artist, title1, title2) 
      VALUES(1, 'Superpitcher', 'Yesterday', 'Yesterday');
      
      INSERT INTO music.albums(id, artist, title1, title2) 
      VALUES(2, 'Hilary Duff', 'So Yesterday', 'So Yesterday');
      
      INSERT INTO music.albums(id, artist, title1, title2) 
      VALUES(3, 'The Mr. T Experience', 'Yesterday Rules', 'Yesterday Rules');
      
      SELECT artist,title1 FROM music.albums WHERE title1='Yesterday';
      
       artist                 | title1
      ------------------------+----------------
                 Superpitcher |       Yesterday
                  Hilary Duff |    So Yesterday
         The Mr. T Experience | Yesterday Rules
       
      (3 rows)
      
      SELECT artist,title1 FROM music.albums WHERE title2='Yesterday';
      
      artist                 | title1
      ------------------------+----------------
                 Superpitcher |       Yesterday
                  Hilary Duff |    So Yesterday
         The Mr. T Experience | Yesterday Rules
        
      (3 rows)
      

      The semantic of = is not respected. SASI should return only 1 row with exact match. Using LIKE would return all 3 rows. It does impact both PREFIX and CONTAINS mode. Using NonTokenizerAnalyzer return 1 row with exact match.

      So indeed, the semantics of = depends on the chosen analyzer, which is inconsistent. We should force = to be exact match no matter which analyzer is chosen.

      Attachments

        Activity

          People

            xedin Pavel Yaskevich
            doanduyhai DuyHai Doan
            Pavel Yaskevich
            Sam Tunnicliffe
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: