Uploaded image for project: 'ORC'
  1. ORC
  2. ORC-528

orc-tools timestamps off by one?

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 1.5.5, 1.6.0
    • 1.5.7, 1.6.1, 1.7.0
    • tools
    • None

    Description

      I'm trying to understand how do deal properly with timestamps.  I've created a CSV file with some crucial timestamps (at least I believe these are):

      2019-01-01 00:00:00.0000
       2015-01-01 00:00:00.0001
       2015-01-01 00:00:00.0000
       2014-12-31 23:59:59.9999
       1970-01-01 00:00:00.0001
       1970-01-01 00:00:00.0000
       1969-12-31 23:59:59.9999
       1969-12-31 23:59:59.0001
       1969-12-31 23:59:59.0000
       1969-12-31 23:59:58.9999
      

      I've created an ORC file using hive-1.1.0-cdh5.14.2.  Hive is able to read this file back correctly.  All timestamps seem to match.  Reading the same file using orc-tools shows different results:

       

      {{
      {"_col0":"2019-01-01 00:00:00.0"}
      }}
       {{
      {"_col0":"2015-01-01 00:00:00.0001"}
      }}
       {{
      {"_col0":"2015-01-01 00:00:00.0"}
      }}
       {{
      {"_col0":"2014-12-31 23:59:59.9999"}
      }}
       {{
      {"_col0":"1970-01-01 00:00:00.0001"}
      }}
       {{
      {"_col0":"1970-01-01 00:00:00.0"}
      }}
       {{
      {"_col0":"1969-12-31 23:59:58.9999"}
      }}
       {{
      {"_col0":"1969-12-31 23:59:59.0001"}
      }}
       {{
      {"_col0":"1969-12-31 23:59:59.0"}
      }}
       {{
      {"_col0":"1969-12-31 23:59:57.9999"}
      }}
      

       
      The actual result/difference here being the last and 4th from last row, which are one second off.

      With some modifications I managed to have orc-tools generate a file itself with timestamps using convert (see ORC-526), which, when I read this one back in hive-1.1.0-cdh5.14.2 results in:

      2019-01-01 00:00:00
       2015-01-01 00:00:00.0001
       2015-01-01 00:00:00
       2014-12-31 23:59:59.9999
       1970-01-01 00:00:00.0001
       1970-01-01 00:00:00
       1970-01-01 00:00:00.9999
       1969-12-31 23:59:59.0001
       1969-12-31 23:59:59
       1969-12-31 23:59:59.9999

      which is also wrong: 4th row from bottom and on the last row by one second, but this time in the other direction.  When I read the file with orc-tools itself, it shows correct output (58) for the last row, but incorrect ouput for the 4th from bottom.  I noticed orc-tools-1.2.0 cannot read the file from 1.6.0.  1.3.4 can, which also results in the incorrect output.

      orc-tools-1.6.0:

      {{
      {"mytime":"2019-01-01 00:00:00.0"}
      }}
       {{
      {"mytime":"2015-01-01 00:00:00.0001"}
      }}
       {{
      {"mytime":"2015-01-01 00:00:00.0"}
      }}
       {{
      {"mytime":"2014-12-31 23:59:59.9999"}
      }}
       {{
      {"mytime":"1970-01-01 00:00:00.0001"}
      }}
       {{
      {"mytime":"1970-01-01 00:00:00.0"}
      }}
       {{
      {"mytime":"1970-01-01 00:00:00.9999"}
      }}
       {{
      {"mytime":"1969-12-31 23:59:59.0001"}
      }}
       {{
      {"mytime":"1969-12-31 23:59:59.0"}
      }}
       {{
      {"mytime":"1969-12-31 23:59:58.9999"}
      }}
      

       

      orc-tools-1.3.4:

      {{
      {"mytime":"2019-01-01 00:00:00.0"}
      }}
       {{
      {"mytime":"2015-01-01 00:00:00.0001"}
      }}
       {{
      {"mytime":"2015-01-01 00:00:00.0"}
      }}
       {{
      {"mytime":"2014-12-31 23:59:59.9999"}
      }}
       {{
      {"mytime":"1970-01-01 00:00:00.0001"}
      }}
       {{
      {"mytime":"1970-01-01 00:00:00.0"}
      }}
       {{
      {"mytime":"1970-01-01 00:00:00.9999"}
      }}
       {{
      {"mytime":"1969-12-31 23:59:58.0001"}
      }}
       {{
      {"mytime":"1969-12-31 23:59:59.0"}
      }}
       {{
      {"mytime":"1969-12-31 23:59:58.9999"}
      }}
      

       

      I'm getting a bit lost at what's right and wrong, but I'm getting the feeling something doesn't add up here.

      Attachments

        Issue Links

          Activity

            People

              yuokada Yukihiro Okada
              fgroffenorcl Fabian Groffen
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m