Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-6642

[Python] chained access of ParquetDataset's metadata segfaults

    XMLWordPrintableJSON

Details

    Description

      Creating and reading a parquet dataset:

      table = pa.table({'a': [1, 2, 3]})
      
      import pyarrow.parquet as pq
      pq.write_table(table, '__test_statistics_segfault.parquet')
      dataset = pq.ParquetDataset('__test_statistics_segfault.parquet')
      dataset_piece = dataset.pieces[0]
      

      If you access the metadata and a column's statistics in steps, this works fine:

      meta = dataset_piece.get_metadata()
      row = meta.row_group(0)
      col = row.column(0)
      

      but doing it chained in one step, this segfaults:

      dataset_piece.get_metadata().row_group(0).column(0)
      

      dataset_piece.get_metadata().row_group(0) still works, but additionally with .column(0) then it segfaults.

      Attachments

        Issue Links

          Activity

            People

              jorisvandenbossche Joris Van den Bossche
              jorisvandenbossche Joris Van den Bossche
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m