Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-6520

[Python] Segmentation fault on writing tables with fixed size binary fields

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 0.14.1
    • 0.15.0
    • Python
    • python(3.7.3), pyarrow(0.14.1), arrow-cpp(0.14.1), parquet-cpp(1.5.1), Arch Linux x86_64

    Description

      I'm not sure if this should be reported to Parquet or here.

      When I tried to serialize a pyarrow table with a fixed size binary field (holds 16 byte UUID4 information) to a parquet file, segmentation fault occurs.

      Here is the minimal example to reproduce:

      import pyarrow as pa
      from pyarrow import parquet as pq
      data = {"col": pa.array([b"1234" for _ in range(10)])}
      fields = [("col", pa.binary(4))]
      schema = pa.schema(fields)
      table = pa.table(data, schema)
      pq.write_table(table, "test.parquet")
      segmentation fault (core dumped) ipython

       

      Yet, it works if I don't specify the size of the binary field.

      import pyarrow as pa
      from pyarrow import parquet as pq
      data = {"col": pa.array([b"1234" for _ in range(10)])}
      fields = [("col", pa.binary())]
      schema = pa.schema(fields)
      table = pa.table(data, schema)
      pq.write_table(table, "test.parquet")

      Thanks,

      Attachments

        Issue Links

          Activity

            People

              jorisvandenbossche Joris Van den Bossche
              ft Furkan Tektas
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h 40m
                  2h 40m