You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
These arrays will not be able to support value buffers > 2 GB until #23776 is addressed, but there's some value in supporting these types anyway, to simplify integration with other Arrow implementations.
…#43269)
### Rationale for this change
See #43266. Note that LargeBinary and LargeString are still limited to 2 GiB buffers, and LargeList is limited to offsets that can be represented as int32.
### What changes are included in this PR?
* Add new Array subtypes: LargeBinaryArray, LargeStringArray and LargeListArray
* Support round-tripping these array types via the IPC format
* Support round-tripping these array types via the C Data Interface
* Improve error messages when importing arrays that are too large via IPC or C Data Interface
* Enable integration tests for the new types
* Update documentation
### Are these changes tested?
Yes, I've added some basic tests specifically for the new array types, and added these to the test data generator so they're covered by the existing tests for round tripping using IPC and C Data Interface.
### Are there any user-facing changes?
Yes, this is a new user facing feature.
### Implementation notes
* I haven't added builders for these new array types. Given they're added to help with interoperability with other libraries, I wouldn't expect .NET users to build arrays of these types as they provide no other benefit over the non-large types until we have proper large memory support. But I'm happy to add this if it would be useful.
* The new array types share a lot of logic with the non-large types. I considered trying to consolidate this logic by adding a new `BinaryArrayBase<TOffset>` class for example, but I think this would require generic math support to work nicely, and would still complicate the code quite a bit and add extra virtual method call overhead. So I think it's fine to keep these new Array subtypes independent from the non-large types.
* I haven't included support for materializing a LargeStringArray (see #41048). I'm not sure whether there would be a use for this, but it could be added later if needed.
* GitHub Issue: #43266
Authored-by: Adam Reeve <[email protected]>
Signed-off-by: Curt Hagenlocher <[email protected]>
Describe the enhancement requested
This is a subtask of #34736
These arrays will not be able to support value buffers > 2 GB until #23776 is addressed, but there's some value in supporting these types anyway, to simplify integration with other Arrow implementations.
See #34736 (comment) for some more context.
Component(s)
C#
The text was updated successfully, but these errors were encountered: