Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-26769

partition prunning in inner join

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.1.0
    • None
    • SQL

    Description

      When joining a partitioned parquet table with another table by partition column it should prunne partitions from partitioned table based on another table values.

      example:

      tableA parquet table partitioned be part_filter

      tableB table with column with partition values

       

      tableA is partitioned by part_A,part_B,part_C,part_D

      tableB is a single column with 2 rows having part_A and part_B as values.

       

      doing 

      select * from tableA inner join tableB on tableA.part_filter=tableB.part_filter

      should generate a partition prunning on tableA based on tableB values (in this case scanning only 2 partitions) but it wll read all 4 partitions from tableA only filter the results.

       

      note: this kind of approach works on Hive (filtering tableA partitions)

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            nhufas nhufas

            Dates

              Created:
              Updated:

              Slack

                Issue deployment