Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-18370

[Python] `ds.write_dataset` doesn't allow feather compression

    XMLWordPrintableJSON

Details

    • Bug
    • Status: In Progress
    • Major
    • Resolution: Unresolved
    • 10.0.0
    • None
    • Python
    • None
    • Ubuntu 22.04

    Description

      `ds.write_dataset` allows specifying Parquet compression, for example:

      import pandas as pd
      import pyarrow as pa
      import pyarrow.dataset as ds
      
      df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
      
      df = pa.Table.from_pandas(df)
      
      ds.write_dataset(
          df,
          base_dir='test',
          format='parquet',
          file_options=ds.ParquetFileFormat().make_write_options(compression='snappy'))
      

      However, such trick (the `file_options` argument) doesn't work for feather, as the following code gives me an error:

      import pandas as pd
      import pyarrow as pa
      import pyarrow.dataset as ds
      
      df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
      
      df = pa.Table.from_pandas(df)
      
      ds.write_dataset(
          df,
          base_dir='test',
          format='feather',
      
          file_options=ds.FeatherFileFormat().make_write_options(compression='uncompressed'))
      

      The error: `TypeError: FeatherFileFormat.make_write_options() takes no keyword arguments`

      Attachments

        Activity

          People

            Unassigned Unassigned
            Amiao Yu Zhu
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: