Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-18405

[Ruby] Raw table converter rebuilds chunked arrays

    XMLWordPrintableJSON

Details

    Description

      Consider the following Ruby script:

      require 'arrow'
      data = Arrow::ChunkedArray.new([Arrow::Int64Array.new([1])])
      table = Arrow::Table.new('column' => data)
      puts table['column'].data_type
      

      This prints "int64" with red-arrow 9.0.0 and "uint8" in 10.0.0.

      From my understanding it is due to this commit: https://github.com/apache/arrow/commit/913d9c0a9a1a4398ed5f56d713d586770b4f702c#diff-f7f19bbc3945ea30ba06d851705f2d58f7666507bb101c4e151014ca398bd635R42

      The old version would not call ArrayBuilder.build on a ChunkedArray, but the new version does. This is a problem for us, because we need the column to stay int64.

      A workaround is to specify a schema and list of arrays instead to bypass the raw table converter:

      table = Arrow::Table.new([{name: 'column', type: 'int64'}], [data])
      

      Attachments

        Issue Links

          Activity

            People

              kou Kouhei Sutou
              stenlarsson Sten Larsson
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 50m
                  50m