Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
10.0.0
Description
Provided example shows that pyarrow does not handle partition value that contains '/' correctly:
import pandas as pd import pyarrow as pa from pyarrow import dataset as ds df = pd.DataFrame({ 'value': [1, 2], 'instrument_id': ['A/Z', 'B'], }) ds.write_dataset( data=pa.Table.from_pandas(df), base_dir='data', format='parquet', partitioning=['instrument_id'], partitioning_flavor='hive', ) table = ds.dataset( source='data', format='parquet', partitioning='hive', ).to_table() tables = [table] df = pa.concat_tables(tables).to_pandas() tables = [table] df = pa.concat_tables(tables).to_pandas() print(df.head())
Result:
value instrument_id 0 1 A 1 2 B
Expected behaviour:
Option 1: Result should be:
value instrument_id 0 1 A/Z 1 2 B
Option 2: Error should be raised to avoid '/' in partition value.
Attachments
Issue Links
- links to