Depricate Arrow2, update Polars version and keep all functionaliy #720
Replies: 3 comments 8 replies
-
Hi @EricFecteau , thanks for opening the discussion and proposing the solution! Indeed, arrow2 was introduced in connectorx for exporting data to polars in the beginning. Since arrow2 is replaced with polars_arrow for polars, we need to add a new destination for polars, which can leverage the existing arrow destination for minimizing the implementation effort. The solution looks reasonable to me. I'm currently kind of busy with other works and may not be able to start working on it lately. Please feel free to open a PR for it and let me know if you have any questions during implementation! I would suggest to create a new destination polars in connectorx following the above and keep the arrow2 solution for now (for compatibility) and drop it later. |
Beta Was this translation helpful? Give feedback.
-
Hello again @wangxiaoying ! Thank you for the reply and recommendation! I have implemented the solution on my fork (here in the update_polars branch for arrow2 and arrow_rs_polars branch for arrow-rs), but have two design questions and one technical question before doing a PR. Design 1: Polars versionSince the reason I encountered this issue was that Any concerns there - specially with minimal compatible version? Should I instead of `polars = { version = ">=0.43" } as the minimal working version (tested). Polars change frequently so the minimal working version (0.43) is fairly recent. I would not be surprised if a future version also breaks this in the near future. Design 2: No changes to python versionIf I understand correctly, this change has zero impact on the python binding, because Polars is not a destination for ConnectorX-python (only Arrow-rs, Arrow2 and Pandas). Instead, Polars uses the arrow2 return type directly when they use ConnectorX. Long term, if I think the Polars could already move to Technical 1: Arrow2 - Polars test failuresWhen importing FFI with Issue technical detailsHere is way more info in case it can help solve the issue. I am more than happy to try implementing something with some guidance. For When using import_array_from_c from polars_arrow, I then get an error. It tries to create the polars_arrow LargeUtf8 from the FFI, and when requesting a u8 buffer (index = 2) (and creating it), when the buffer pointer is fetched, it fails because index == array_n_buffers. This makes the error ( Not sure if this is a For Pull request resolutionHere is a decision that needs to be made before I make this pull request - what should be the way forward:
I suspect that there is not a ton of users using the arrays, if they were never implemented for arrow-rs. |
Beta Was this translation helpful? Give feedback.
-
As an outside observer who is usually a big polars fan, I'd suggest you just leave it to polars to make their arrow interoperable with the official one. The other libraries they use (fastexcel, delta, iceberg, etc) do their arrow to arrow via pyarrow. |
Beta Was this translation helpful? Give feedback.
-
Issue & Roadblock
I have been trying to use ConnectorX to move data from PostgreSQL (or any other database) to Polars, in Rust. I have hit two specific roadblocks:
Since I need Polars version 0.45 (or near this version), I had to come up with another solution.
Attempted solution
One solution is to use Polars to convert the Arrow-rs ArrayData object (by iterating through the RecordBatch and the Columns) and build a Polars DataFrame (using the current version of Polars). This is a popular solution online for moving Arrow-rs to Polars. But unfortunately, the Arrow-rs support for Polars was removed in 0.44.
Solution found
Since both the
arrow-rs
and thepolars_arrow
implement exporting and importing Arrow Array through the FFI bindings via Arrow’s C Data Interface, it's possible to Zero-Copy Arrow data between them. Ultimately, I was able to build this with this code:Conclusion
From my understanding, the main reason the Arrow2 crate is supported is to move data to Polars in Rust exclusively. So my idea is that everything Arrow2 related can be dropped from this crate, and some interoperability with
arrow-rs
andpolars_arrow
can be built in as a "destination" using the FFI Arrow methods (and therefore.polars()
can be built using that). This would significantly simplify the create as arrow2 would no longer need to be implemented as transport.Beta Was this translation helpful? Give feedback.
All reactions