[SPARK-45604] Converting timestamp_ntz to array<timestamp_ntz> can cause NPE or SEGFAULT on parquet vectorized reader - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.5.0
Fix Version/s: 3.4.2, 4.0.0, 3.5.1
Component/s: Spark Core
Labels:
- pull-request-available

Description

Repro:

```

val path = "/tmp/sample_parquet_file"

spark.sql("SELECT CAST('2019-01-01' AS TIMESTAMP_NTZ) AS field").write.parquet(path)
spark.read.schema("field ARRAY<TIMESTAMP_NTZ>").parquet(path).collect()

```

Depending on the memory mode, it will throw an NPE on OnHeap mode and SEGFAULT on OffHeap mode.

Attachments

Issue Links

links to

GitHub Pull Request #43451

GitHub Pull Request #43452

Activity

People

Assignee:: Zamil Majdy

Reporter:: Zamil Majdy

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 19/Oct/23 07:47

Updated:: 29/Jan/24 00:18

Resolved:: 22/Oct/23 05:54