Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-46460

The filter of partition including cast function may lead the partition pruning to disable

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 3.2.0
    • None
    • Optimizer, SQL
    • Patch

    Description

      SQL:select * from test_db.test_table where day between date_sub('2023-12-01',1) and  '2023-12-03'

      The Physical Plan of sql above will implement cast function on partition col 'day',  like this, cast(day as date) > 2023-11-30. In this situation, spark just pass the filter condition day < "2023-12-03" to HiveMetastore, not including filter condition cast(day as date) > 2023-11-30, which may lead performance of HMS degarde if the HiveTable has huge number of partitions.

       

      In this regard, a new rule may solve this problem. This rule can convert binary comparison cast(day as date) > 2023-11-30 to day > cast(2023-11-30 as string). The right node is foldable, so the result is day > "2023-11-30", and the filter condition passed to HMS will be day > "2023-11-30" and day < "2023-12-03".

       

       

      Attachments

        1. SPARK-46460.patch
          2 kB
          Zhou Tong

        Issue Links

          Activity

            People

              Unassigned Unassigned
              littlelittlewhite09 Zhou Tong
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - 168h
                  168h
                  Remaining:
                  Remaining Estimate - 168h
                  168h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified