Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
2.4.0
-
Local installation of Spark on Linux (Java 1.8, Ubuntu 18.04).
Description
Example code (spark-shell from Spark 2.4.0):
scala> Seq("A", "A", null).toDS.repartition(1).write.parquet("t") scala> spark.read.parquet("t").where(not(col("value").eqNullSafe("A"))).show +-----+ |value| +-----+ +-----+
Running the same with Spark 2.2.0 or 2.3.2 gives the correct result:
scala> spark.read.parquet("t").where(not(col("value").eqNullSafe("A"))).show +-----+ |value| +-----+ | null| +-----+
Also, with a different input sequence and Spark 2.4.0 we get the correct result:
scala> Seq("A", null).toDS.repartition(1).write.parquet("t") scala> spark.read.parquet("t").where(not(col("value").eqNullSafe("A"))).show +-----+ |value| +-----+ | null| +-----+
Attachments
Issue Links
- is caused by
-
PARQUET-1510 Dictionary filter skips null values when evaluating not-equals.
- Resolved
- relates to
-
PARQUET-1309 Parquet Java uses incorrect stats and dictionary filter properties
- Resolved
- links to