Skip to content

Commit e9431fc

Browse files
authored
Optimize merging of partial case expression results (#18152)
## Which issue does this PR close? - Improvement in the context of #18075 - Continues on #17898 ## Rationale for this change Case evaluation currently uses `PhysicalExpr::evaluate_selection` for each branch of the case expression. This implementation is fine, but because `evaluate_selection` is not specific to the `case` logic we're missing some optimisation opportunities. The main consequence is that too much work is being done filtering record batches and scattering results. This PR introduces specialised filtering logic and result interleaving for case. A more detailed description and diagrams are available at #18075 (comment) ## What changes are included in this PR? Rewrite the `case_when_no_expr` and `case_when_with_expr` evaluation loops to avoid as much unnecessary work as possible. In particular the remaining rows to be evaluated are retained across loop iterations. This allows the record batch that needs to be filtered to shrink as the loop is being evaluated which reduces the number of rows that needs to be refiltered. If a when predicate does not match any rows at all, filtering is avoided entirely. The final result is also not merged every loop iteration. Instead an index vector is constructed which is used to compose the final result once using a custom 'multi zip'/'interleave' like operation. ## Are these changes tested? Covered by existing unit tests and SLTs ## Are there any user-facing changes? No
1 parent 5814c7e commit e9431fc

File tree

2 files changed

+636
-118
lines changed
  • datafusion

2 files changed

+636
-118
lines changed

0 commit comments

Comments
 (0)