[AMD] Improved error estimate for the matmul example #5571

ravil-mobile · 2025-01-10T10:40:16Z

This PR addresses the issue related numerical accuracy of the matmul verification test in 03-matrix-multiplication.py. Instead of the hard-coded atol and rtol numbers, the absolute error is taken as the forward error computed as:

$a t o l = ‖ C_{D o u b l e} - C_{H a l f} ‖$

where $‖ \cdot ‖$ is the Frobenius norm; $C$ is the output matrix; $D o u b l e$ - float64; $H a l f$ - float16.

Regarding Issue#5283, the accuracy-test is not failing now when the reference solution is computed using a CPU GEMM implementation (e.g., numpy). Note, this was the issue before.

The forward error stays the same on a system because we throw the same seed value before generating random matrices (see here). In my case, it is equal to 2.4012 which may seem quite high for some people. However, this value is the upper bound of the error relative to the quasi-accurate solution (double precision).

The new approach was tested on MI300, MI200 and H100 GPUs with ✅ in all cases.

Closes #5283

ravil-mobile · 2025-01-10T10:42:00Z

@antiagainst, @zhanglx13, @sjw36, please, have a look at the PR.

ravil-mobile force-pushed the ravil/matmul-expl branch from c0b9cd7 to c10f4c8 Compare January 10, 2025 11:18

[AMD] Improved error estimate for the matmul example

Loading
Loading status checks…

a59dbfa

ravil-mobile force-pushed the ravil/matmul-expl branch from c10f4c8 to a59dbfa Compare January 10, 2025 11:28

antiagainst marked this pull request as ready for review January 13, 2025 19:25

antiagainst requested a review from ptillet as a code owner January 13, 2025 19:25

antiagainst requested a review from peterbell10 January 13, 2025 19:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMD] Improved error estimate for the matmul example #5571

[AMD] Improved error estimate for the matmul example #5571

ravil-mobile commented Jan 10, 2025 •

edited

Loading

ravil-mobile commented Jan 10, 2025

[AMD] Improved error estimate for the matmul example #5571

Are you sure you want to change the base?

[AMD] Improved error estimate for the matmul example #5571

Conversation

ravil-mobile commented Jan 10, 2025 • edited Loading

ravil-mobile commented Jan 10, 2025

ravil-mobile commented Jan 10, 2025 •

edited

Loading