forked from triton-lang/triton
-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LHS Registers Part 2 - Pipelining #19
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This was referenced Sep 23, 2024
ggengnv
force-pushed
the
lhs-reg-pipeline
branch
from
September 23, 2024 23:36
3adec01
to
995c0b8
Compare
Addressed all comments in the original PR that are relevant to part 2 (one comment) in this PR instead. |
ggengnv
force-pushed
the
lhs-reg-pipeline
branch
2 times, most recently
from
September 24, 2024 22:52
b4d61b8
to
deefac7
Compare
Moerafaat
force-pushed
the
llvm-head
branch
from
September 25, 2024 14:11
3596dc5
to
10d3305
Compare
ggengnv
force-pushed
the
lhs-reg-pipeline
branch
from
September 25, 2024 22:13
deefac7
to
1cb92c3
Compare
ggengnv
force-pushed
the
lhs-reg-pipeline
branch
from
October 9, 2024 22:19
1cb92c3
to
56eefde
Compare
Moerafaat
force-pushed
the
llvm-head
branch
2 times, most recently
from
October 30, 2024 09:27
da8895b
to
c8f89a6
Compare
ThomasRaoux
pushed a commit
to triton-lang/triton
that referenced
this pull request
Nov 15, 2024
…for SMEM-to-MMAv3 DotOp Copy (#5003) Hopper has two kinds of WGMMAs, "SS" (both operands in shmem) and "RS" (LHS operand A in registers). In cases where we apply elementwise operations on A before WGMMA, Triton previously will copy A from global memory (GMEM) into registers (RF), perform the elementwise ops, and then copy to shared memory (SMEM) to perform SS WGMMA. This PR adds an optimization for the case above to use RS GEMM. This requires the following changes: - In TritonGPU OptimizeDotOperands pass, add optimizations to change SS GEMM into RS GEMM. - Add TritonGPU -> LLVM lowering for copying from SMEM to RF in MMA v3 dotOperand layout. NOTE: This may not see perf gain, and may even see perf loss, for certain shapes (e.g. small-K), and additional optimizations are in a separate [PR](openxla#19) (still more optimizations are WIP). Please advise on the merging strategy.
hmalgewatta
pushed a commit
to hmalgewatta/triton
that referenced
this pull request
Nov 15, 2024
…for SMEM-to-MMAv3 DotOp Copy (triton-lang#5003) Hopper has two kinds of WGMMAs, "SS" (both operands in shmem) and "RS" (LHS operand A in registers). In cases where we apply elementwise operations on A before WGMMA, Triton previously will copy A from global memory (GMEM) into registers (RF), perform the elementwise ops, and then copy to shared memory (SMEM) to perform SS WGMMA. This PR adds an optimization for the case above to use RS GEMM. This requires the following changes: - In TritonGPU OptimizeDotOperands pass, add optimizations to change SS GEMM into RS GEMM. - Add TritonGPU -> LLVM lowering for copying from SMEM to RF in MMA v3 dotOperand layout. NOTE: This may not see perf gain, and may even see perf loss, for certain shapes (e.g. small-K), and additional optimizations are in a separate [PR](openxla#19) (still more optimizations are WIP). Please advise on the merging strategy.
merged into upstream triton |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
(Part 1: #18)
Part 2 of "WGMMA with LHS operand in registers" feature.
Commits from
0f4faac
"Initial changes for pipelining" onward are relevant to this part. The previous commits were for part 1.This PR enables SMEM pipelining for WGMMA operand A when it's in RF. It is necessary to add this change to those in part 1 to actually achieve better performance.