-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallel distinct hash aggregate #4881
base: master
Are you sure you want to change the base?
Conversation
b177964
to
3413492
Compare
Benchmark ResultMaster commit hash:
|
Benchmarks (adapted from ClickBench, on 128 threads 2xEPYC 7551):
*This last query is substantially different from the SQL version since we don't have an exact equivalent of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me! Thanks!
Have a question and several very minor comments, you can take a look.
src/include/processor/operator/aggregate/aggregate_hash_table.h
Outdated
Show resolved
Hide resolved
Benchmark ResultMaster commit hash:
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #4881 +/- ##
==========================================
- Coverage 86.53% 86.53% -0.01%
==========================================
Files 1403 1403
Lines 60536 60665 +129
Branches 7442 7460 +18
==========================================
+ Hits 52385 52494 +109
- Misses 7982 8002 +20
Partials 169 169 ☔ View full report in Codecov by Sentry. |
6f96740
to
18b0903
Compare
Benchmark ResultMaster commit hash:
|
18b0903
to
9b6e731
Compare
Extends the partitioning done in the hash aggregate operator to apply to distinct hash tables as well as the main hash table. The aggregation is then disabled in the thread-local hash tables for distinct keys, and computed from scratch when combining the data into the global tables.
This still doesn't parallelize the simple distinct aggregate, which I'll do in a later PR.
This also fixes support for nested types in the hash aggregate generally (I realised that I had missed implementing the row data versions of the comparison functions for structs and lists).