-
Notifications
You must be signed in to change notification settings - Fork 821
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ability to append non contiguous strings to StringBuilder
#6347
Comments
@alamb i would love to take this on since it relates to some of the work I was doing on the DF concat UDF 🫡 I might not be able to start it til next week but it doesn't seem super high priority :) |
This is actually already supported https://docs.rs/arrow-array/latest/arrow_array/builder/type.GenericStringBuilder.html#example
This could be achieved by appending empty strings instead of nulls, and then modifying the output, either by using into_parts and reconstructing the array, or using the nullif kernel. |
What I plan to do here is:
|
|
Filed #6373 to track adding same support to StringView. The other thing this ticket currently discusses is the ability to provide a |
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
DataFusion has an optimized version of
concat(col1, ...)
for strings (I believe contributed by @JasonLi-cn ) that avoids:To do this today, we added
StringArrayBuilder
, which is similar but not hte same asStringBuilder
in arrowhttps://github.com/apache/datafusion/blob/4838cfbf453f3c21d9c5a84f9577329dd78aa763/datafusion/functions/src/string/common.rs#L354-L417
The major differences are:
write
to incrementally build up each string and then callappend_offset
to create each string.StringBuilder
requires each input to be a single contiguous string to call https://docs.rs/arrow/latest/arrow/array/type.StringBuilder.html#method.append_valueDescribe the solution you'd like
I think it is worth figuring out how to create a similar API for
StringBuilder
Incrementally wirte values
Here is one ideal suggestion of how to write values that I think would be relatively easy to use:
Similarly, adding a
finish_with_nulls(..)
type function that took aNullBuffer
would be beneficial if the caller already knew about nullsDescribe alternatives you've considered
We could not do this at all (or just keep the code downstream in DataFusion)
Additional context
The text was updated successfully, but these errors were encountered: