Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-45604

Converting timestamp_ntz to array<timestamp_ntz> can cause NPE or SEGFAULT on parquet vectorized reader

    XMLWordPrintableJSON

Details

    Description

      Repro:

      ```

      val path = "/tmp/sample_parquet_file"

      spark.sql("SELECT CAST('2019-01-01' AS TIMESTAMP_NTZ) AS field").write.parquet(path)
      spark.read.schema("field ARRAY<TIMESTAMP_NTZ>").parquet(path).collect()

      ```

      Depending on the memory mode, it will throw an NPE on OnHeap mode and SEGFAULT on OffHeap mode.

      Attachments

        Activity

          People

            majdyz Zamil Majdy
            majdyz Zamil Majdy
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: