Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-47955

Improve DeduplicateRelations performance

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 4.0.0
    • 4.0.0
    • SQL

    Description

      The `DeduplicateRelations` rule can be very slow on large plans due to the `missingInput` check on the whole plan, which was introduced by https://issues.apache.org/jira/browse/SPARK-37932 / https://github.com/apache/spark/pull/35684. Let's try to improve it.

      Attachments

        Issue Links

          Activity

            People

              petertoth Peter Toth
              petertoth Peter Toth
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: