Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMD] Improved error estimate for the matmul example #5571

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ravil-mobile
Copy link
Contributor

@ravil-mobile ravil-mobile commented Jan 10, 2025

This PR addresses the issue related numerical accuracy of the matmul verification test in 03-matrix-multiplication.py. Instead of the hard-coded atol and rtol numbers, the absolute error is taken as the forward error computed as:

a t o l = C D o u b l e C H a l f

where is the Frobenius norm; C is the output matrix; D o u b l e - float64; H a l f - float16.

Regarding Issue#5283, the accuracy-test is not failing now when the reference solution is computed using a CPU GEMM implementation (e.g., numpy). Note, this was the issue before.

The forward error stays the same on a system because we throw the same seed value before generating random matrices (see here). In my case, it is equal to 2.4012 which may seem quite high for some people. However, this value is the upper bound of the error relative to the quasi-accurate solution (double precision).

The new approach was tested on MI300, MI200 and H100 GPUs with ✅ in all cases.

Closes #5283

@ravil-mobile
Copy link
Contributor Author

@antiagainst, @zhanglx13, @sjw36, please, have a look at the PR.

@antiagainst antiagainst marked this pull request as ready for review January 13, 2025 19:25
@antiagainst antiagainst requested a review from ptillet as a code owner January 13, 2025 19:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Numerical accuracy test in 03-matrix-multiplication.py is failing; atol and rtol values
1 participant