Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Won't Fix
-
1.2.1, 2.0.0
-
None
-
None
-
None
Description
Parque timestamps using INT96 are not compatible with other tools, like Impala, due to issues in Hive because it adjusts timezones values in a different way than Impala.
To address such issues. a new table property (parquet.mr.int96.write.zone) must be used in Hive that detects what timezone to use when writing and reading timestamps from Parquet.
The following is the exit criteria for the fix:
- Hive will read Parquet MR int96 timestamp data and adjust values using a time zone from a table property, if set, or using the local time zone if it is absent. No adjustment will be applied to data written by Impala.
- Hive will write Parquet int96 timestamps using a time zone adjustment from the same table property, if set, or using the local time zone if it is absent. This keeps the data in the table consistent.
- New tables created by Hive will set the table property to UTC if the global option to set the property for new tables is enabled.
- Tables created using CREATE TABLE and CREATE TABLE LIKE FILE will not set the property unless the global setting to do so is enabled.
- Tables created using CREATE TABLE LIKE <OTHER TABLE> will copy the property of the table that is copied.
Attachments
Attachments
Issue Links
- blocks
-
SPARK-12297 Add work-around for Parquet/Hive int96 timestamp bug.
- Resolved
- breaks
-
HIVE-16465 NullPointer Exception when enable vectorization for Parquet file format
- Closed
- is related to
-
HIVE-16088 Fix hive conf property name introduced in HIVE-12767
- Reopened
- links to