Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-16072 [C++] Migrate scanner logic to ExecPlan, remove merged generator
  3. ARROW-18388

[C++] Decide on duplicate column handling in scanner, add more tests

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • C++
    • None

    Description

      When a schema has duplicate column names it can be difficult to know how to map between the fragment schema and the dataset schema in the default evolution strategy. It's not clear from the comments describing evolution what the exact behavior is right now. Some suggestions have been:

      • Grab the first column in the fragment schema with the same name
      • Always error if there are duplicate columns
      • Allow duplicate columns but expect there to be the same # of occurrences in both the fragment and dataset schema and assume the order is consistent

      Attachments

        Activity

          People

            Unassigned Unassigned
            westonpace Weston Pace
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: