Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-10491

Impala parquet scanner should use writer.time.zone when converting Hive timestamps

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • Impala 3.4.0
    • None
    • Backend

    Description

      IMPALA-8721 reports some issues with Hive 3 and timezone conversion.

      HIVE-21290 fixed some of the issues, and also sets writer.time.zone in the Parquet metadata, which provides a better way to determine how the time zone was written. E.g.

      tarmstrong@tarmstrong-Precision-7540:~/impala/impala$ hadoop jar ~/repos/parquet-mr/parquet-tools/target/parquet-tools-1.12.0-SNAPSHOT.jar meta /test-warehouse/asdfgh/000000_0
      21/02/08 20:26:44 INFO hadoop.ParquetFileReader: Initiating action with parallelism: 5
      21/02/08 20:26:44 INFO hadoop.ParquetFileReader: reading another 1 footers
      21/02/08 20:26:44 INFO hadoop.ParquetFileReader: Initiating action with parallelism: 5
      file:        hdfs://localhost:20500/test-warehouse/asdfgh/000000_0
      creator:     parquet-mr version 1.10.99.7.2.7.0-44 (build 27344fd5fdaa371e364c604f471b340f8bcf8936)
      extra:       writer.date.proleptic = false
      extra:       writer.time.zone = America/Los_Angeles
      extra:       writer.model.name = 3.1.3000.7.2.7.0-44
      

      We should use this timezone when converting timestamps, I think either always or when convert_legacy_hive_parquet_utc_timestamps=true.

      CC boroknagyz csringhofer

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              tarmstrong Tim Armstrong
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: