-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C#] Support new data types #34736
Comments
Can you also add Tensors (#34746)? |
Map type addressed here: #35243 |
Hi All, just wanted to check if there is any plan on adding support for LargeBinary and LargeList? |
There are basically two scenarios here:
The first of these is probably pretty easy but I don't see much value in having it -- and it would be misleading to "support" LargeBinary but have it work only for smaller arrays. I have an idea for how to support the large buffers required for the second of these (see #38086) but don't expect to be able to work on it for a while. I also suspect it may require a bunch of Flatbuffers-related hackery. Is this something you're interested in implementing? :D |
Hahh, I wish I could, unfortunately I am absolutely new to C#. I work on a Rust project where we use LargeList and LargeBinary values and now we need to pass those into a C# context. Right now I am just trying to figure out what's possible and what isn't, but this seems to be a major blocker unless we do some workarounds. Looking at your suggestions, we would def have issues with 1., as we have some heavy Lidar / Image datasets that we can only safely handle in LargeBinary arrays. |
Are there also individual values in these arrays which are larger than 2GB or is it that the array itself exceeds that size? |
The array itself can exceed 2GB. Individual records would never be close to 2GB, values are mostly under 1 MB. |
Hi @CurtHagenlocher, we've run into issues integrating with Polars, which always exports string data to Arrow as the LargeString type (see pola-rs/polars#15047). We can work around this by casting to String first via PyArrow, but it would simplify things if there was LargeString support in .NET Arrow, even if it didn't yet support values buffers that were actually > 2GB. Would you be open to accepting a PR to add LargeString, LargeBinary and LargeList arrays? I'm hopeful I might eventually be able to help with adding support for IPC record batches and buffers > 2 GB too, but I think there is some value in having support for LargeString etc even if they don't actually support large buffers yet, and it makes sense to me to split this work out from adding support for large buffers. |
I think blocking integration with Polars is probably reason enough to add support for these types. I'd expect, though, that there's a reasonable error experience when the actual sizes do exceed what we can currently support. |
👍 good point, I'll make sure there are helpful error messages if consuming data that's too large via IPC or the C Data Interface. I've opened #43266 for adding these array types. |
Describe the enhancement requested
The C# implementation still needs to add support for:
Component(s)
C#
The text was updated successfully, but these errors were encountered: