Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
GH-40720: [Python] Simplify and improve perf of creation of the colum…
…n names in Table.to_pandas (#40721) ### Rationale for this change The `pandas_compat.py` has over the years grown quite complex and a lot of pandas compatibility code, which probably can be simplified nowadays because of not supporting old pandas and Python versions anymore. One part of the code where this is the case is in the reconstruction of the `.columns` Index object of the resulting DataFrame. Right now that always goes through a MultiIndex (even for simple column names), which has quite some overhead of the simple case. And it also has some old Python/pandas compat code that could be removed. ### What changes are included in this PR? The simplification to not go through a MultiIndex for the simple cases gives a nice speed-up as well: ```python In [1]: table = pa.table({'a': [1, 2, 3], 'b': [0.1, 0.2, 0.3], 'c': [3, 4, 5]}) In [2]: %timeit table.to_pandas() 251 µs ± 1.26 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each) # <-- main 68.1 µs ± 894 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each) # <-- PR ``` ### Are these changes tested? We should have extensive existing tests for this ### Are there any user-facing changes? That should not be the case * GitHub Issue: #40720 Authored-by: Joris Van den Bossche <[email protected]> Signed-off-by: Joris Van den Bossche <[email protected]>
- Loading branch information