Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-9456

[Python] Dataset segfault when not importing pyarrow.parquet

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Not A Bug
    • None
    • 1.0.0
    • Python
    • None

    Description

      To reproduce:

      1. import pyarrow.parquet # if we skip this...
        import pyarrow as pa
        import pyarrow.dataset as ds
        import glob
        ds = pa.dataset.dataset('/data/taxi_parquet/data_0.parquet')
        ds.to_table() # this will crash
         
        $ python pyarrow/crash.py dev
        terminate called after throwing an instance of 'parquet::ParquetException'
        what(): The file only has 19 columns, requested metadata for column: 1049198736
        [1] 1559395 abort (core dumped) python pyarrow/crash.py
         
        When the import is there, it will work fine.
         

      Attachments

        Activity

          People

            Unassigned Unassigned
            maartenbreddels Maarten Breddels
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: