Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-6570

[Python] Use MemoryPool to allocate memory for NumPy arrays in to_pandas calls

    XMLWordPrintableJSON

Details

    Description

      It occurred to me that we can likely improve the performance and scalability of Table.to_pandas or other to_pandas methods by using the active MemoryPool to allocate memory for the array rather than letting NumPy use the system allocator. We would need to use the PyCapsule approach to setting a shared_ptr<Buffer> as the base of the created NumPy arrays

      This has the additional benefit of tracking NumPy-related allocations in the MemoryPool so we will have a more precise accounting of allocated memory.

      Attachments

        Issue Links

          Activity

            People

              wesm Wes McKinney
              wesm Wes McKinney
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h