-
Notifications
You must be signed in to change notification settings - Fork 249
Introduces the new partitioner to implement the reduction StreamK kernel. #3107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
9276e64 to
99c137a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements the classic StreamK reduction strategy for GEMM operations, extending the existing atomic reduction approach. The implementation follows the StreamK algorithm from the referenced paper (https://arxiv.org/abs/2301.03598).
Key changes:
- Added support for the
Reductionstrategy alongside the existingAtomicstrategy - Implemented partial result accumulation across workgroups with synchronization mechanisms
- Modified kernel APIs to return intermediate results for flexible epilogue execution
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| streamk_gemm_tile_partitioner.hpp | Changed buffer size calculation methods from host-only to host-device accessible |
| streamk_gemm_tile_partitioner_impl.hpp | Updated function qualifiers to enable device-side workspace size calculations |
| streamk_gemm_kernel.hpp | Core reduction logic implementation with partial storage, accumulation, and synchronization |
| kernel_launch.hpp | Added preprocessing support in cold iteration loops |
| streamk_gemm_basic.cpp | Updated to use new partitioner and handle workspace initialization for reduction strategy |
| run_gemm_example.inc | Removed unused num_sk_blocks parameter and updated args structure |
| gemm_utils.hpp | Increased tile sizes for improved performance characteristics |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
99c137a to
2c57fe1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Great job!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
0a2ac40
b0282e7 to
0a2ac40
Compare
Proposed changes
Implemented classic StreamK reduction.
Streamk https://arxiv.org/abs/2301.03598
Checklist
Please put an
xinto the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.clang-formaton all changed filesDiscussion
If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered