Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Optimize LIKE with custom escape char (#7730)
Summary: Currently we optimize LIKE operation only if escape char is not specified, this PR adds the ability to apply the optimization even if user specifies escape char. We introduced a PatternStringIterator which handles escaping transparently, so existing optimizations(kPrefix, kSuffix, kSubstring etc) now work for patterns with escape char transparently, and future optimizations will have effect for escaped pattern transparently too. The benchmark result before this optimization: ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ like_generic##like_generic 4.14s 241.44m ---------------------------------------------------------------------------- ---------------------------------------------------------------------------- like_prefix##like_prefix 1.20s 833.70m like_prefix##starts_with 2.92ms 342.44 like_substring##like_substring 4.22s 236.77m like_substring##strpos 6.98ms 143.27 like_suffix##like_suffix 3.09s 323.90m like_suffix##ends_with 3.02ms 331.11 ``` After: ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ like_generic##like_generic 3.86s 258.97m ---------------------------------------------------------------------------- ---------------------------------------------------------------------------- like_prefix##like_prefix 4.18ms 239.24 like_prefix##starts_with 2.76ms 362.05 like_substring##like_substring 7.71ms 129.75 like_substring##strpos 6.67ms 149.90 like_suffix##like_suffix 4.20ms 237.85 like_suffix##ends_with 2.90ms 344.93 ``` In Summary: - Speedup of kSubstring is about 500x. - Speedup of kPrefix is about 250x. - Speedup of kSuffix is about 700x. Why the speedup is so huge? There are two reasons: - Re2 is really slow compare to the optimizations we made, even if the input string is short(10 byte), Re2 is 100x slower than our optimizations. - When the input strings get longer(10bytes -> 1000bytes), the performance of our optimizations does not change much, but Re2's performance will be 10x slower. And we can confirm the speedup is reasonable from the comparison between our optmizations and the simple scalar function strpos, starts_with, ends_with, the performance numbers are quite close(see the like##strpos/starts_with/ends_with in the benchmark result for more details). Pull Request resolved: #7730 Reviewed By: pedroerp Differential Revision: D52077250 Pulled By: mbasmanova fbshipit-source-id: 39703ddcc7f4f2044460d93866670f730e139120
- Loading branch information