[ARROW-18388] [C++] Decide on duplicate column handling in scanner, add more tests - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: C++
Labels:
None

External issue URL:
https://github.com/apache/arrow/issues/33552

Description

When a schema has duplicate column names it can be difficult to know how to map between the fragment schema and the dataset schema in the default evolution strategy. It's not clear from the comments describing evolution what the exact behavior is right now. Some suggestions have been:

Grab the first column in the fragment schema with the same name
Always error if there are duplicate columns
Allow duplicate columns but expect there to be the same # of occurrences in both the fragment and dataset schema and assume the order is consistent

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Weston Pace

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 22/Nov/22 23:34

Updated:: 11/Jan/23 11:59