Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-25104

Backward incompatible timestamp serialization in Parquet for certain timezones

    XMLWordPrintableJSON

Details

    Description

      HIVE-12192, HIVE-20007 changed the way that timestamp computations are performed and to some extend how timestamps are serialized and deserialized in files (Parquet, Avro).

      In versions that include HIVE-12192 or HIVE-20007 the serialization in Parquet files is not backwards compatible. In other words writing timestamps with a version of Hive that includes HIVE-12192/HIVE-20007 and reading them with another (not including the previous issues) may lead to different results depending on the default timezone of the system.

      Consider the following scenario where the default system timezone is set to US/Pacific.

      At apache/master commit 37f13b02dff94e310d77febd60f93d5a205254d3

      CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS PARQUET
       LOCATION '/tmp/hiveexttbl/employee';
      INSERT INTO employee VALUES (1, '1880-01-01 00:00:00');
      INSERT INTO employee VALUES (2, '1884-01-01 00:00:00');
      INSERT INTO employee VALUES (3, '1990-01-01 00:00:00');
      SELECT * FROM employee;
      
      1 1880-01-01 00:00:00
      2 1884-01-01 00:00:00
      3 1990-01-01 00:00:00

      At apache/branch-2.3 commit 324f9faf12d4b91a9359391810cb3312c004d356

      CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS PARQUET
       LOCATION '/tmp/hiveexttbl/employee';
      SELECT * FROM employee;
      
      1 1879-12-31 23:52:58
      2 1884-01-01 00:00:00
      3 1990-01-01 00:00:00

      The timestamp for eid=1 in branch-2.3 is different from the one in master.

      Attachments

        Issue Links

          Activity

            People

              zabetak Stamatis Zampetakis
              zabetak Stamatis Zampetakis
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h
                  2h