Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-46671

InferFiltersFromConstraint rule is creating a redundant filter

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Reopened
    • Minor
    • Resolution: Unresolved
    • 3.5.0
    • None
    • SQL

    Description

      while bring my old PR which uses a different approach to the ConstraintPropagation algorithm ( SPARK-33152) in synch with current master, I noticed a test failure in my branch for SPARK-33152:
      The test which is failing is
      InferFiltersFromConstraintSuite:

        test("SPARK-43095: Avoid Once strategy's idempotence is broken for batch: Infer Filters") {
          val x = testRelation.as("x")
          val y = testRelation.as("y")
          val z = testRelation.as("z")
      
          // Removes EqualNullSafe when constructing candidate constraints
          comparePlans(
            InferFiltersFromConstraints(x.select($"x.a", $"x.a".as("xa"))
              .where($"xa" <=> $"x.a" && $"xa" === $"x.a").analyze),
            x.select($"x.a", $"x.a".as("xa"))
              .where($"xa".isNotNull && $"x.a".isNotNull && $"xa" <=> $"x.a" && $"xa" === $"x.a").analyze)
      
          // Once strategy's idempotence is not broken
          val originalQuery =
            x.join(y, condition = Some($"x.a" === $"y.a"))
              .select($"x.a", $"x.a".as("xa")).as("xy")
              .join(z, condition = Some($"xy.a" === $"z.a")).analyze
      
          val correctAnswer =
            x.where($"a".isNotNull).join(y.where($"a".isNotNull), condition = Some($"x.a" === $"y.a"))
              .select($"x.a", $"x.a".as("xa")).as("xy")
              .join(z.where($"a".isNotNull), condition = Some($"xy.a" === $"z.a")).analyze
      
          val optimizedQuery = InferFiltersFromConstraints(originalQuery)
          comparePlans(optimizedQuery, correctAnswer)
          comparePlans(InferFiltersFromConstraints(optimizedQuery), correctAnswer)
        }
      

      In the above test, I believe the below assertion is not proper.
      There is a redundant filter which is getting created.
      Out of these two isNotNull constraints, only one should be created.

      $"xa".isNotNull && $"x.a".isNotNull
      Because "xa" is an alias of x."a" , so only one isNullConstraint is needed.

      // Removes EqualNullSafe when constructing candidate constraints
      comparePlans(
      InferFiltersFromConstraints(x.select($"x.a", $"x.a".as("xa"))
      .where($"xa" <=> $"x.a" && $"xa" === $"x.a").analyze),
      x.select($"x.a", $"x.a".as("xa"))
      .where($"xa".isNotNull && $"x.a".isNotNull && $"xa" <=> $"x.a" && $"xa" === $"x.a").analyze)

      This is not a big issue, but it highlights the need to take a relook at the code of ConstraintPropagation and related code.

      I am filing this jira so that constraint code can be tightened/made more robust.

      Attachments

        Activity

          People

            Unassigned Unassigned
            ashahid7 Asif
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: