Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-7747

[Python] coerce_timestamps + allow_truncated_timestamps does not work as expected with nanoseconds

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Not A Problem
    • 0.15.1
    • None
    • Python
    • None

    Description

      Hi,

      I've encountered what seems to me a bug using

      pyarrow==0.15.1
      pandas==0.25.3
      numpy==1.18.1

       
      I'm trying to write a table containing nanosecond timestamps to a millisecond schema. Here is a minimal example:

      import pyarrow as pa
      import pyarrow.parquet as pq
      import pandas as pd
      import numpy as np
      
      pyarrow_schema = pa.schema([pa.field("datetime_ms", pa.timestamp("ms"))])
      
      timestamp = np.datetime64("2019-06-21T22:13:02.901123")
      
      d = {"datetime_ms": timestamp}
      
      df = pd.DataFrame(d, index=range(1))
      
      table = pa.Table.from_pandas(df, schema=pyarrow_schema)
      
      pq.write_table(
          table,
          "test.parquet",
          coerce_timestamps="ms",
          allow_truncated_timestamps=True,
      )
      
      pyarrow.lib.ArrowInvalid: ('Casting from timestamp[ns] to timestamp[ms] would lose data: 1561155182901123000', 'Conversion failed for column datetime_ms with type datetime64[ns]')

      From my understanding, the expected behaviour shoud be arrow allowing the conversion anyway, even if loosing some data.

      Related discussions:

      This test https://github.com/apache/arrow/blob/f70dbd1dbdb51a47e6a8a8aac8efd40ccf4d44f2/python/pyarrow/tests/test_parquet.py#L846 does not explicitely check for nanosecond timestamps.

      To be honest I've not checked at the code yet, so let me know whether I missed something. I'd be happy to fix it if it's really a bug.

      Attachments

        Activity

          People

            Unassigned Unassigned
            theophile Théophile Chevalier
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: