-
Notifications
You must be signed in to change notification settings - Fork 832
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve zip
performance
#6812
improve zip
performance
#6812
Conversation
arrow-data/src/transform/mod.rs
Outdated
/// i.e. `index` >= the number of source arrays | ||
/// or `start` + `count` > the length of the `index`th array | ||
/// | ||
pub fn extend_scalar(&mut self, index: usize, scalar_index: usize, count: usize) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pub fn extend_scalar(&mut self, index: usize, scalar_index: usize, count: usize) { | |
pub fn extend_n(&mut self, index: usize, scalar_index: usize, count: usize) { |
Scalars don't really exist as a notion at this level
Perhaps you might be able to contribute a benchmark showing the performance difference, there is a lot going on in this kernel and it isn't immediately clear to me that the dyn dispatch in MutableArrayData won't dominate. |
I actually started to create a (WIP) PR to avoid this dyn dispatch: |
The dyn dispatch is unavoidable in the general case, as arrow supports arbitrarily nested data structures. Generally the approach is to have specialized kernel implementations for non-nested types where this yields better performance, and then fallback to type-erased versions where this isn't possible |
So I did local benchmarks and it did not improve the perf from what I saw, so closing |
Which issue does this PR close?
No issue
Rationale for this change
I saw that we do for every range of true check if the truthy values and falsy are scalar, which can be checked once
What changes are included in this PR?
zip
functionextend_scalar
function toMutableArrayData
to be optimized for single value that is being copy over and over againAre there any user-facing changes?
No (I think)
there are no
benchmarks
tozip
unfortunately