Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Simplify is ins to an OR chain of eqs #3800

Merged
merged 2 commits into from
Feb 14, 2025
Merged

Conversation

colin-ho
Copy link
Contributor

@colin-ho colin-ho commented Feb 12, 2025

Closes #3662
Implement simplify expr rule to transform is_in into an OR chain of eq, e.g. e IN (1, 2, 3) -> e = 1 OR e = 2 OR e = 3.

This can be faster than using the is_in kernel, which has to hash the items and does a lookup per row in the base column.

Results on Q12 and Q19 of TPCH SF 10:

Q12: 410.57s -> 313.44s
Q19: 489.13s -> 401.24s

I configured the threshold for max number of items to transform into the OR chain is 5. I chose the number 5 because of the significant symbolism it has in nature and culture. Humans have 5 fingers on each hand and 5 toes on each foot. Most flowering plants have 5 petals. It's the number of Platonic solids (regular polyhedra). Many animals have pentameral (5-fold) symmetry, like starfish. In Chinese culture, there are five blessings (longevity, happiness, wealth, luck and prosperity), as well as five elements (wood, water, fire, Earth and metal).

@github-actions github-actions bot added the feat label Feb 12, 2025
@colin-ho colin-ho marked this pull request as ready for review February 12, 2025 20:49
@colin-ho colin-ho requested a review from kevinzwang February 12, 2025 20:49
Copy link

codspeed-hq bot commented Feb 12, 2025

CodSpeed Performance Report

Merging #3800 will degrade performances by 12.76%

Comparing colin/chain-is-ins (c442202) with main (e6db43b)

Summary

❌ 1 regressions
✅ 26 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark BASE HEAD Change
test_iter_rows_first_row[100 Small Files] 152.5 ms 174.8 ms -12.76%

Copy link

codecov bot commented Feb 12, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 77.87%. Comparing base (ca36593) to head (c442202).
Report is 9 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3800      +/-   ##
==========================================
+ Coverage   77.82%   77.87%   +0.04%     
==========================================
  Files         745      750       +5     
  Lines       94416    94796     +380     
==========================================
+ Hits        73480    73819     +339     
- Misses      20936    20977      +41     
Files with missing lines Coverage Δ
daft/expressions/expressions.py 93.84% <100.00%> (+0.02%) ⬆️
src/daft-algebra/src/simplify/mod.rs 100.00% <100.00%> (ø)
src/daft-recordbatch/src/lib.rs 85.27% <100.00%> (+0.97%) ⬆️

... and 40 files with indirect coverage changes

Copy link
Member

@kevinzwang kevinzwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

let chain_of_eqs = list
.iter()
.map(|item| e.clone().eq(item.clone()));
Transformed::yes(chain_of_eqs.reduce(|a, b| a.or(b)).unwrap())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiny nit: you can use combine_disjunction here instead of the reduce

@colin-ho colin-ho enabled auto-merge (squash) February 14, 2025 20:02
@colin-ho colin-ho merged commit fba938d into main Feb 14, 2025
41 of 42 checks passed
@colin-ho colin-ho deleted the colin/chain-is-ins branch February 14, 2025 20:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimizer: Rewrite IsIn expressions
2 participants