-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Optimize merging of partial case expression results #18152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
macOS test failure is unrelated afaict. Looks like a DNS issue on the test runner. |
206af12 to
39ab973
Compare
24e3d38 to
39ab973
Compare
4442c54 to
7539068
Compare
interleave to compose case expression results
|
@alamb could you run the benchmarks against this? |
|
The test failure for f49d3ea seems unrelated. Pulling in changes from |
96451ac to
864b156
Compare
interleave to compose case expression results864b156 to
01d51d8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good to me, left some comments, once resolved I can approve
I don't think you need generics here, you can have a trait for the logic and require implementation on the few differences (like evaluating the |
I think I got 'em all except for the
I think I'm going to leave this for a followup PR. There are some subtle differences wrt an owned |
fine by me
np |
Just for kicks I asked Claude code to give it a try. It came up with a solution that's similar to what I had in mind myself, but it resolved the reference vs owned problem by cloning Leaving this challenge for later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| /// The more constrained indexing mechanism used by this algorithm makes it easier to copy values | ||
| /// in contiguous slices. In the example below, the two subsequent elements from array `2` can be | ||
| /// copied in a single operation from the source array instead of copying them one by one. | ||
| /// Long spans of null values are also especially cheap because they do not need to be represented | ||
| /// in an input array. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW the check whether we have continues items from the same array and consecutive order of items can be added to interleave as well.
|
Thanks for your help dotting the i's and crossing the t's @rluvaton |
|
I'll run the benchmark again before merging (waiting for @alamb reply to know whether he wants to continue his review or not) |
Benchmark results:EnvNeofetch: cpufetch: Compare:
Results |
|
Edit: these are both actually Nice to see the improvements on the lookup benchmarks. Your hash table work will do even better, but for now this is a good proxy for more general case expressions. |
different pr sounds good. we should also start split different optimizations to different files to make it more manageable. also, we need to add a way to check if expression can fail or not (e.g. divide can fail on divide by 0), column cannot fail (unless of course the column is missing) this way it open another optimization for this that we can evaluate both in case it is cheap and do simple zip. but for now we can add another optimization for column/literal or column/literal and do a simple |
|
Removing the scatter and using a custom 'unaligned' zip compared to bea4b68 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| /// ││ C ││ ├─────────┤ ├─────────┤ | ||
| /// │├─────────┤│ │ 2 │ │ C │ | ||
| /// ││ D ││ ├─────────┤ ├─────────┤ | ||
| /// │└─────────┘│ │ 2 │ │ D │ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the result array be B, A, C, C (b/c index 2 means C)? I made a PR to fix this:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Answered in PR but just for completeness if anyone else reads this, the diagram is correct. Merge works like interleave, except the second index from interleave is implicitly defined as the number of times the first index value occurs. So 2, 2 for merge is equivalent to (2, 0), (2, 1) for interleave.
In executable code what's drawn is
let a1 = StringArray::from(vec![Some("A")]).to_data();
let a2 = StringArray::from(vec![Some("B")]).to_data();
let a3 = StringArray::from(vec![Some("C"), Some("D")]).to_data();
let indices = vec![
PartialResultIndex::none(),
PartialResultIndex::try_new(1).unwrap(),
PartialResultIndex::try_new(0).unwrap(),
PartialResultIndex::none(),
PartialResultIndex::try_new(2).unwrap(),
PartialResultIndex::try_new(2).unwrap()
];
let merged = merge(&vec![a1, a2, a3], &indices).unwrap();
let merged = merged.as_string::<i32>();
assert_eq!(merged.len(), indices.len());
assert!(!merged.is_valid(0));
assert!(merged.is_valid(1));
assert_eq!(merged.value(1), "B");
assert!(merged.is_valid(2));
assert_eq!(merged.value(2), "A");
assert!(!merged.is_valid(3));
assert!(merged.is_valid(4));
assert_eq!(merged.value(4), "C");
assert!(merged.is_valid(5));
assert_eq!(merged.value(5), "D");
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added as unit test in #18369
## Which issue does this PR close? - None, followup for #18152 ## Rationale for this change Add a unit test testing (and demonstrating) the merge function. ## What changes are included in this PR? Adds an additional test case ## Are these changes tested? Who tests the tests? ## Are there any user-facing changes? No

Which issue does this PR close?
Rationale for this change
Case evaluation currently uses
PhysicalExpr::evaluate_selectionfor each branch of the case expression. This implementation is fine, but becauseevaluate_selectionis not specific to thecaselogic we're missing some optimisation opportunities. The main consequence is that too much work is being done filtering record batches and scattering results. This PR introduces specialised filtering logic and result interleaving for case.A more detailed description and diagrams are available at #18075 (comment)
What changes are included in this PR?
Rewrite the
case_when_no_exprandcase_when_with_exprevaluation loops to avoid as much unnecessary work as possible. In particular the remaining rows to be evaluated are retained across loop iterations. This allows the record batch that needs to be filtered to shrink as the loop is being evaluated which reduces the number of rows that needs to be refiltered. If a when predicate does not match any rows at all, filtering is avoided entirely.The final result is also not merged every loop iteration. Instead an index vector is constructed which is used to compose the final result once using a custom 'multi zip'/'interleave' like operation.
Are these changes tested?
Covered by existing unit tests and SLTs
Are there any user-facing changes?
No