Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Browse files
Browse the repository at this point in the history
### Rationale for this change This is a long standing [pandas ticket](pandas-dev/pandas#53011) with some fairly horrible workarounds, where complex arrow types do not serialise well to pandas as the pandas metadata string is not parseable. However, `types_mapper` always had highest priority as it overrode what was set before. ### What changes are included in this PR? By switching the logical ordering, it means that we don't need to call `_pandas_api.pandas_dtype(dtype)` when using the pyarrow backend, thus resolving the issue of complex `dtype` with `list` or `struct`. It will likely still fail if the numpy backend is used, but at least this gives a working solution rather than an inability to load files at all. ### Are these changes tested? Existing tests should stay unchanged and a new test for the complex type has been added ### Are there any user-facing changes? **This PR contains a "Critical Fix".** This makes `pd.read_parquet(..., dtype_backend="pyarrow")` work with complex data types where the metadata added by pyarrow during `pd.to_parquet` is not serialisable and currently throwing an exception. This issue currently prevents the use of pyarrow as the default backend for pandas. * GitHub Issue: #39914 Lead-authored-by: bretttully <[email protected]> Co-authored-by: Joris Van den Bossche <[email protected]> Co-authored-by: Brett Tully <[email protected]> Signed-off-by: Joris Van den Bossche <[email protected]>
- Loading branch information