-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TKW] Add CDNA2 + CDNA3 Int8 intrinsics and refactor intrinsic enums #279
Conversation
996c40f
to
1a1c8fc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple more comments, but otherwise LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great and I am very surprised that you were able to reuse all the other intrinsics. One final ask - can you add a lit test showing the IR form of the mfmas?
3660dd8
to
fa61434
Compare
.github/workflows/perf.yaml
Outdated
@@ -61,3 +61,10 @@ jobs: | |||
export WAVE_RUN_E2E_TESTS=1 | |||
export TEST_PARAMS_PATH="tests/kernel/wave/test_param.json" | |||
pytest -n 1 --capture=tee-sys -vv ./tests/kernel/wave/ | |||
|
|||
- name: Run e2e tests on MI250 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove this step completely, and just keep the one before it. Just hange its name from Run e2e tests on MI300 to Run e2e tests on AMD GPU or something and remove the if: "contains(matrix.os, 'mi300') && !cancelled()"
line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you further explain? If I understand correctly, you want to only have one test? but we need to run both MI250 and MI300 though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually I see what you mean! that's a great idea, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! thanks you again for the tip! :)
For correctness check probably should add to |
Makes sense! thanks! |
a92a16a
to
9b7d04d
Compare
9b7d04d
to
5a5c76f
Compare
tests/kernel/wave/wave_gemm_test.py
Outdated
@pytest.mark.parametrize( | ||
"mfma_variant", | ||
[ | ||
MMAType.F32_16x16x32_F8, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be I8?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh, good catch! it was passing because the generated instructions between them are technically in the mlir vector level are the same haha, will fix that. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done!
tests/kernel/wave/wave_gemm_test.py
Outdated
dynamic_symbols_map=dynamic_symbols_map, | ||
): | ||
randint_hi = 4 | ||
a = device_randint(randint_hi, (shape[0], shape[2]), dtype=torch.int16) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to pass here directly a torch.int8?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeap, done!
- Added CDNA2 int8 intrinsic layouts - Modified iree_ref to handle int gemms - Modified certain e2e test to require certain GPU arch to be available - Modified enum for easy handling in the future - Get default architecture function - Borrowed device_randint from Ivan Signed-off-by: Stanley Winata <[email protected]> Co-authored-by: Ivan Butygin <[email protected]> Signed-off-by: Stanley Winata <[email protected]>
Signed-off-by: Stanley Winata <[email protected]>
Signed-off-by: Stanley Winata <[email protected]>
Signed-off-by: Stanley Winata <[email protected]>
Signed-off-by: Stanley Winata <[email protected]>
Signed-off-by: Stanley Winata <[email protected]>
Signed-off-by: Stanley Winata <[email protected]>
Signed-off-by: Stanley Winata <[email protected]>
Signed-off-by: Stanley Winata <[email protected]>
Signed-off-by: Stanley Winata <[email protected]>
5a5c76f
to
3a3f9f3
Compare
Signed-off-by: Stanley Winata <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! this looks great!
@@ -21,7 +21,7 @@ jobs: | |||
fail-fast: false | |||
matrix: | |||
version: [3.11] | |||
os: [ubuntu-latest, nodai-amdgpu-mi300-x86-64] | |||
os: [ubuntu-latest, nodai-amdgpu-mi300-x86-64, nodai-amdgpu-mi250-x86-64] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very nice :)
Manually tested that the generated iree_ref for int gemms are working as expected!