Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TKW] Add CDNA2 + CDNA3 Int8 intrinsics and refactor intrinsic enums #279

Merged
merged 11 commits into from
Nov 20, 2024

Conversation

raikonenfnu
Copy link
Contributor

@raikonenfnu raikonenfnu commented Nov 19, 2024

  • Added CDNA2 int8 intrinsic layouts
  • Modified iree_ref to handle int gemms
  • Modified certain e2e test to require certain GPU arch to be available
  • Modified enum for easy handling in the future
  • Get default architecture function
  • Borrowed device_randint from Ivan
  • Turn on CDNA2 runner for TK-CI

Manually tested that the generated iree_ref for int gemms are working as expected!

@raikonenfnu raikonenfnu changed the title [TKW] Add CDNA2 Int8 intrinsics and refactor intrinsic enums [TKW] Add CDNA2 + CDNA3 Int8 intrinsics and refactor intrinsic enums Nov 19, 2024
Copy link
Contributor

@Hardcode84 Hardcode84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple more comments, but otherwise LGTM

iree/turbine/kernel/wave/utils.py Show resolved Hide resolved
tests/kernel/wave/wave_gemm_test.py Outdated Show resolved Hide resolved
Copy link
Contributor

@harsh-nod harsh-nod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great and I am very surprised that you were able to reuse all the other intrinsics. One final ask - can you add a lit test showing the IR form of the mfmas?

iree/turbine/kernel/wave/constraints.py Show resolved Hide resolved
iree/turbine/kernel/wave/constraints.py Show resolved Hide resolved
tests/kernel/wave/wave_gemm_test.py Show resolved Hide resolved
@@ -61,3 +61,10 @@ jobs:
export WAVE_RUN_E2E_TESTS=1
export TEST_PARAMS_PATH="tests/kernel/wave/test_param.json"
pytest -n 1 --capture=tee-sys -vv ./tests/kernel/wave/

- name: Run e2e tests on MI250
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove this step completely, and just keep the one before it. Just hange its name from Run e2e tests on MI300 to Run e2e tests on AMD GPU or something and remove the if: "contains(matrix.os, 'mi300') && !cancelled()" line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you further explain? If I understand correctly, you want to only have one test? but we need to run both MI250 and MI300 though

Copy link
Contributor Author

@raikonenfnu raikonenfnu Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually I see what you mean! that's a great idea, thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! thanks you again for the tip! :)

@Hardcode84
Copy link
Contributor

Hardcode84 commented Nov 20, 2024

For correctness check probably should add to ci-tk.yaml instead. perf.yaml was meant for performance measurement, but this work was never finished.

@raikonenfnu
Copy link
Contributor Author

For correctness check probably should add to ci-tk.yaml instead. perf.yaml was meant for performance measurement, but this work was never finished.

Makes sense! thanks!

@pytest.mark.parametrize(
"mfma_variant",
[
MMAType.F32_16x16x32_F8,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be I8?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh, good catch! it was passing because the generated instructions between them are technically in the mlir vector level are the same haha, will fix that. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

dynamic_symbols_map=dynamic_symbols_map,
):
randint_hi = 4
a = device_randint(randint_hi, (shape[0], shape[2]), dtype=torch.int16)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to pass here directly a torch.int8?

Copy link
Contributor Author

@raikonenfnu raikonenfnu Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeap, done!

raikonenfnu and others added 10 commits November 20, 2024 08:44
- Added CDNA2 int8 intrinsic layouts
- Modified iree_ref to handle int gemms
- Modified certain e2e test to require certain GPU arch to be available
- Modified enum for easy handling in the future
- Get default architecture function
- Borrowed device_randint from Ivan

Signed-off-by: Stanley Winata <[email protected]>
Co-authored-by: Ivan Butygin <[email protected]>
Signed-off-by: Stanley Winata <[email protected]>
Signed-off-by: Stanley Winata <[email protected]>
Signed-off-by: Stanley Winata <[email protected]>
Signed-off-by: Stanley Winata <[email protected]>
Signed-off-by: Stanley Winata <[email protected]>
Signed-off-by: Stanley Winata <[email protected]>
Signed-off-by: Stanley Winata <[email protected]>
Signed-off-by: Stanley Winata <[email protected]>
Signed-off-by: Stanley Winata <[email protected]>
Copy link
Contributor

@harsh-nod harsh-nod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! this looks great!

@@ -21,7 +21,7 @@ jobs:
fail-fast: false
matrix:
version: [3.11]
os: [ubuntu-latest, nodai-amdgpu-mi300-x86-64]
os: [ubuntu-latest, nodai-amdgpu-mi300-x86-64, nodai-amdgpu-mi250-x86-64]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice :)

@raikonenfnu raikonenfnu merged commit f8e0cbb into iree-org:main Nov 20, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants