Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-1510

Dictionary filter skips null values when evaluating not-equals.

    XMLWordPrintableJSON

Details

    Description

      This was discovered in Spark, see SPARK-26677. From the Spark PR:

      // Repeat the values to get dictionary encoding.
      Seq(Some("A"), Some("A"), None).toDF.repartition(1).write.mode("overwrite").parquet("/tmp/foo")
      spark.read.parquet("/tmp/foo").where("NOT (value <=> 'A')").show()
      +-----+
      |value|
      +-----+
      +-----+
      
      // Use plain encoding.
      Seq(Some("A"), None).toDF.repartition(1).write.mode("overwrite").parquet("/tmp/bar")
      spark.read.parquet("/tmp/bar").where("NOT (value <=> 'A')").show()
      +-----+
      |value|
      +-----+
      | null|
      +-----+
      

      This is a correctness issue.

      Attachments

        Issue Links

          Activity

            People

              rdblue Ryan Blue
              rdblue Ryan Blue
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: