Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-20154

Improve unix_timestamp(args) to handle automatic DST-switching timezones

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.1.0
    • None
    • UDF
    • None

    Description

      Currently unix_timestamp(args) UDF will only handle static timezone specifiers. It does not recognize SystemV specifiers such as EST5EDT or PST8PDT.

      Based on this experiment, when z is used to parse a TZ string like UTC4PDT (obviously not a valid SystemV specifier) - it will parse the time as UTC.
      When z<n>z is used to parse a TZ string like UTC4PDT, it will take parse the timestamp as the TZ of the final z position. This is demonstrated by my final query when the format string z4z1z is used to parse UTC4PDT1EDT.

      0: jdbc:hive2://localhost:10000/default>; select from_unixtime(unix_timestamp("2018-02-01 00:00:00 UTC4PDT", "yyyy-MM-dd HH:mm:ss z"), "yyyy-MM-dd HH:mm:ss zzzz");
      +--------------------------------------------+--+
      |                    _c0                     |
      +--------------------------------------------+--+
      | 2018-01-31 16:00:00 Pacific Standard Time  |
      +--------------------------------------------+--+
      1 row selected (0.041 seconds)
      0: jdbc:hive2://localhost:10000/default>; select from_unixtime(unix_timestamp("2018-02-01 00:00:00 UTC", "yyyy-MM-dd HH:mm:ss z"), "yyyy-MM-dd HH:mm:ss zzzz");
      +--------------------------------------------+--+
      |                    _c0                     |
      +--------------------------------------------+--+
      | 2018-01-31 16:00:00 Pacific Standard Time  |
      +--------------------------------------------+--+
      1 row selected (0.041 seconds)
      0: jdbc:hive2://localhost:10000/default>; select from_unixtime(unix_timestamp("2018-02-01 00:00:00 UTC4PDT", "yyyy-MM-dd HH:mm:ss z4z"), "yyyy-MM-dd HH:mm:ss zzzz");
      +--------------------------------------------+--+
      |                    _c0                     |
      +--------------------------------------------+--+
      | 2018-01-31 23:00:00 Pacific Standard Time  |
      +--------------------------------------------+--+
      1 row selected (0.047 seconds)
      0: jdbc:hive2://localhost:10000/default>; select from_unixtime(unix_timestamp("2018-02-01 00:00:00 UTC4PDT1EDT", "yyyy-MM-dd HH:mm:ss z4z1z"), "yyyy-MM-dd HH:mm:ss zzzz");
      +--------------------------------------------+--+
      |                    _c0                     |
      +--------------------------------------------+--+
      | 2018-01-31 20:00:00 Pacific Standard Time  |
      +--------------------------------------------+--+
      1 row selected (0.055 seconds)
      0: jdbc:hive2://localhost:10000/default>;
      

      So all in all, I don't think the SystemV specifier EST5EDT or PST8PDT are valid to unix_timestamp(args) at all. And that those when parsed with the z<n>z format string, will be read as whatever valid timezone at the final position (effectively EDT and PDT respectively in when those valid SystemV TZ specifiers above are used).

      Attachments

        Activity

          People

            Unassigned Unassigned
            thundergun Vincent Tran
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: