Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-17187

Guardrail for SELECT IN terms and their cartesian product

    XMLWordPrintableJSON

Details

    Description

      Add a guardrail to limit the number restrictions generated by the cartesian product of the IN restrictions of a SELECT query, for example:

      # Guardrail to warn or abort when IN query creates a cartesian product with a 
      # size exceeding threshold, eg. "a in (1,2,...10) and b in (1,2...10)" results in 
      # cartesian product of 100.
      # The two thresholds default to -1 to disable. 
      in_select_cartesian_product:
          warn_threshold: -1
          abort_threshold: -1
      

      As an example of why this guardrails is proposed, these queries bring a C* instance to its knees even before the query starts executing:

      @Test
      public void testPartitionKeyTerms() throws Throwable
      {
          createTable("CREATE TABLE %s (pk1 int, pk2 int, pk3 int, pk4 int, pk5 int, pk6 int, pk7 int, pk8 int, pk9 int, " +
                     "PRIMARY KEY((pk1, pk2, pk3, pk4, pk5, pk6, pk7, pk8, pk9)))");
          execute("SELECT * FROM %s WHERE pk1 in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10) " +
                  "AND pk2 in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10) " +
                  "AND pk3 in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10) " +
                  "AND pk4 in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10) " +
                  "AND pk5 in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10) " +
                  "AND pk6 in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10) " +
                  "AND pk7 in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10) " +
                  "AND pk8 in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10) " +
                  "AND pk9 in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10);");
      }
      
      @Test
      public void testClusteringKeyTerms() throws Throwable
      {
          createTable("CREATE TABLE %s (pk int ,ck1 int, ck2 int, ck3 int, ck4 int, ck5 int, ck6 int, ck7 int, ck8 int, ck9 int, " +
                  "PRIMARY KEY(pk, ck1, ck2, ck3, ck4, ck5, ck6, ck7, ck8, ck9))");
          execute("SELECT * FROM %s WHERE pk = 1 " +
                  "AND ck1 in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10) " +
                  "AND ck2 in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10) " +
                  "AND ck3 in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10) " +
                  "AND ck4 in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10) " +
                  "AND ck5 in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10) " +
                  "AND ck6 in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10) " +
                  "AND ck7 in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10) " +
                  "AND ck8 in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10) " +
                  "AND ck9 in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10);");
      }
      

      Additional information for newcomers:

      1. Add the configuration for the new guardrail on cartesian product in the guardrails section of cassandra.yaml.
      2. Add a getInCartesianProduct method in GuardrailsConfig returning a Threshold.Config object
      3. Implement that method in GuardrailsOptions, which is the default yaml-based implementation of GuardrailsConfig
      4. Add a Threshold guardrail named inCartesianProduct in Guardrails, using the previously created config
      5. Define JMX-friendly getters and setters for the previously created config in GuardrailsMBean
      6. Implement the JMX-friendly getters and setters in Guardrails
      7. Now that we have the guardrail ready, it’s time to use it. We should search for a place to invoke the Guardrails#inCartesianProduct guard method. The MultiCBuilder look like good candidates for this.
      8. Finally, add some tests for the new guardrail. Given that the new guardrail is a Threshold, our new test should probably extend ThresholdTester.

      Attachments

        Issue Links

          Activity

            People

              adelapena Andres de la Peña
              adelapena Andres de la Peña
              Andres de la Peña
              Ekaterina Dimitrova
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 10m
                  1h 10m