Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-46092

Overflow in Parquet row group filter creation causes incorrect results

    XMLWordPrintableJSON

Details

    Description

      While the parquet readers don't support reading parquet values into larger Spark types, it's possible to trigger an overflow when creating a Parquet row group filter that will then incorrectly skip row groups and bypass the exception in the reader,

      Repro:

      Seq(0).toDF("a").write.parquet(path)
      spark.read.schema("a LONG").parquet(path).where(s"a < ${Long.MaxValue}").collect()

      This succeeds and returns no results. This should either fail if the Parquet reader doesn't support the upcast from int to long or produce result `[0]` if it does.

      Attachments

        Activity

          People

            johanl-db Johan Lasperas
            johanl-db Johan Lasperas
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: