[SPARK-46092] Overflow in Parquet row group filter creation causes incorrect results - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.5.0
Fix Version/s: 4.0.0, 3.5.1, 3.3.4, 3.4.3
Component/s: SQL
Labels:
- correctness
- pull-request-available

Description

While the parquet readers don't support reading parquet values into larger Spark types, it's possible to trigger an overflow when creating a Parquet row group filter that will then incorrectly skip row groups and bypass the exception in the reader,

Repro:

Seq(0).toDF("a").write.parquet(path)
spark.read.schema("a LONG").parquet(path).where(s"a < ${Long.MaxValue}").collect()

This succeeds and returns no results. This should either fail if the Parquet reader doesn't support the upcast from int to long or produce result `[0]` if it does.