[ARROW-6570] [Python] Use MemoryPool to allocate memory for NumPy arrays in to_pandas calls - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.15.0
Component/s: Python
Labels:
- pull-request-available

External issue URL:
https://github.com/apache/arrow/issues/22928

Description

It occurred to me that we can likely improve the performance and scalability of Table.to_pandas or other to_pandas methods by using the active MemoryPool to allocate memory for the array rather than letting NumPy use the system allocator. We would need to use the PyCapsule approach to setting a shared_ptr<Buffer> as the base of the created NumPy arrays

This has the additional benefit of tracking NumPy-related allocations in the MemoryPool so we will have a more precise accounting of allocated memory.

Attachments

Issue Links

links to

GitHub Pull Request #5398

Activity

People

Assignee:: Wes McKinney

Reporter:: Wes McKinney

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 16/Sep/19 12:17

Updated:: 11/Jan/23 07:47

Resolved:: 18/Sep/19 16:26

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

0.5h