Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-10436

Index selection should be weighted in favour of custom expressions

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Normal
    • Resolution: Fixed
    • 3.0.0 rc2
    • Legacy/CQL
    • None

    Description

      If a SELECT contains a custom index expression (CASSANDRA-10217), that should always be chosen as the primary expression during query execution. Should the statement contain other expressions which can be satsfied by a built in index, we don't currently have the ability to apply the custom expression as a filter. What's more, the method of selecting which index to use is fairly primitive (and cannot be overridden until CASSANDRA-10214), so we should ensure that a custom expression, if present, is always chosen.

      Suppose we have a custom index implementation which provides prefix matching on text fields.

      CREATE TABLE ks.t (k int, v1 int, v2 text, PRIMARY KEY(k));
      CREATE INDEX v1_idx ON ks.t(v1);
      CREATE CUSTOM INDEX v2_idx ON ks.t(v2) USING 'com.example.CustomIndex';
      
      INSERT INTO ks.t(k, v1, v2) VALUES(0, 0, 'abc');
      INSERT INTO ks.t(k, v1, v2) VALUES(1, 1, 'def');
      
      SELECT * FROM ks.t WHERE v1=0 AND expr(v2_idx, 'd*') ALLOW FILTERING;
      

      In the above example the expected result would contain no rows, which would be the case if v2_idx is selected as the primary (i.e. most selective) index during query execution. However, if v1_idx is chosen instead, the results of its lookup will have no further filter applied and so an incorrect result will be returned.

      Note: this has always been something of an issue for custom indexes as the expressions they support may not be natively filterable by C*. For example, with the full text search syntax used by Stratio & DSE Search, if the custom index isn't selected the filtering will erroneously remove all rows as the value of the dummy column does not match the Lucene/Solr search expression literal. It's probably a fairly minor concern as in most cases a query using a custom index will not include other expressions (usually because custom indexes are per-row indexes, and so can support multi-field expression syntax). Also, an index implementation can return a very low number of estimated result count to try and ensure it is selected, custom expressions just provide an opportunity to improve the situation.

      Attachments

        Activity

          People

            samt Sam Tunnicliffe
            samt Sam Tunnicliffe
            Sam Tunnicliffe
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: