What's New
Features
- Inequality join operation added, matching one column's values that are less / greater than the other column's values.
- Parallelized Theta join
- Changed
theta_join()
arguments (and documentation) to use the term "condition" instead of "relation".
Performance
-
Large performance improvements for
theta_join()
: x25 increase in speed on the benchmark, and avoids an intermediate Cartesian join that can quickly consume all memory for larger inputs. -
Slight performance improvements for
fuzzy_join()
Documentation
- Clarified time complexity and worst case for Fuzzy join