Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-18091

[Ruby] Arrow::Table#join returns duplicated key columns

Add voteWatch issue
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Ruby
    • None

    Description

      `Arrow::Table#join` returns columns with duplicate keys. Duplicate column names are acceptable in Arrow, but it is preferable to use one.

      Also with `type: :full_outer`, column data should be merged.

      table1
      => 
      #<Arrow::Table:0x7f9706109380 ptr=0x55a91a4cac10>
              KEY     X         
      0       A       1         
      1       B       2         
      2       C       3

      table2
      => 
      #<Arrow::Table:0x7f970415d2c0 ptr=0x55a91a348ce0>
              KEY     X
      0       A       4
      1       B       5
      2       D       6

       
      Should omit `:KEY` in right

      table1.join(table2, :KEY)
      => 
      #<Arrow::Table:0x7f96fd152548 ptr=0x55a91af21110>                   
              KEY     X       KEY     X                                   
      0       A       1       A       4                                   
      1       B       2       B       5

       
      Should merge `:KEY`s

      table1.join(table2, :KEY, type: :full_outer)
      => 
      #<Arrow::Table:0x7f96fd0e1550 ptr=0x55a91a1a6410>                   
              KEY          X  KEY          X                              
      0       A            1  A            4                              
      1       B            2  B            5                              
      2       C            3  (null)  (null)                              
      3       (null)  (null)  D            6

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            heronshoes Hirokazu SUZUKI

            Dates

              Created:
              Updated:

              Slack

                Issue deployment