[ARROW-12124] [Rust] Parquet writer creates invalid parquet files - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Not A Bug
Affects Version/s: None
Fix Version/s: None
Component/s: Rust
Labels:
None

External issue URL:
https://github.com/apache/arrow/issues/27947

Description

I wrote a simple CSV to Parquet converter at https://github.com/domoritz/csv2parquet/blob/f53feb5bd995eab41dee09f2c4d722512052d7ca/src/main.rs.

Running it (`csv2parquet test.txt test.parquet`) with a simple file such as

```
a,b,c
0,1,hello world
0,1,hello world
0,1,hello world
0,1,hello world
0,1,hello world
0,1,hello world
0,1,hello world
```

And then trying to read in Python with

```
import pandas as pd
df = pd.read_parquet('test.parquet')
df.to_csv('test2.csv')
```

Results in this error

```
OSError: Could not open parquet input source '<Buffer>': Invalid: Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.
```

The schema seems to be inferred correctly

```
Inferred Schema:
{
"fields": [
{
"name": "a",
"nullable": false,
"type":

{ "name": "int", "bitWidth": 64, "isSigned": true }

,
"children": []
},
{
"name": "b",
"nullable": false,
"type":

{ "name": "int", "bitWidth": 64, "isSigned": true }

,
"children": []
},
{
"name": "c",
"nullable": false,
"type":

{ "name": "utf8" }

,
"children": []
}
],
"metadata": {}
}
```

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Dominik Moritz

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 28/Mar/21 19:18

Updated:: 11/Jan/23 08:24

Resolved:: 29/Mar/21 15:25