Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-7885

[Python] pyarrow.serialize does not support dask dataframe

    XMLWordPrintableJSON

Details

    • Wish
    • Status: Closed
    • Minor
    • Resolution: Won't Fix
    • None
    • None
    • Python

    Description

      Currently pyarrow knows how to serialize pandas dataframes but not dask dataframes.

      SerializationCallbackError: pyarrow does not know how to serialize objects of type <class 'dask.dataframe.core.DataFrame'>. 

      Pickling the dask dataframe foregoes the benefits of using pyarrow for the sub dataframes.

      Pyarrow support for serializing dask dataframes would allow storing dataframes efficiently in a database instead of a file system (e.g. parquet). 

      Attachments

        Activity

          People

            Unassigned Unassigned
            Jonen Benjamin
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: