Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
pyarrow==0.15.1
Description
One cannot save the index when using pyarrow.parquet.write_to_dataset() with given partition_cols arguments. Here I have created a minimal example which shows the issue:
from pathlib import Path import pandas as pd from pyarrow import Table from pyarrow.parquet import write_to_dataset, read_table path = Path('/home/user/trials') file_name = 'local_database.parquet' df = pd.DataFrame({"A": [1, 2, 3], "B": ['a', 'a', 'b']}, index=pd.Index(['a', 'b', 'c'], name='idx')) table = Table.from_pandas(df) write_to_dataset(table, str(path / file_name), partition_cols=['B'] ) df_read = read_table(str(path / file_name)) df_read.to_pandas()
The issue is rather important for pandas and dask users.