fix for DEFAULT TUNING_TARGET on AMD and NVIDIA GPUs #517

s-Nick · 2024-05-08T07:45:09Z

This PR fixes most of the tests that fails on AMD and NVIDIA GPUs using DEFAULT configuration.
It fixes all of them for AMD and let only trsm operator to be fixed for NVIDIA.

In particular it fixes:

iamax
iamin
trsv
tbsv
tpsv

iamax/iamin:
The sycl:shift_group_left api requires all group(sub_group) takes part to the operation, removing the if-condition solves the problem.

txsv operators:
broadcast operations inside the kernel require a specific size of group and subgroup, so calling the kernel implementation from default is not enough due to hardware differences. This solution uses runtime checks to select the correct template parameters. This leads to compile more kernels than before but from my tests it doesn't affect significantly compilation time.

src/interface/blas2/backend/default.hpp

src/operations/blas1/IndexMaxMin.hpp

The current implementation fails due to a wrong usage of the shift_group_left api. This patch fix it and re-enable the tests.

This patch fixes a synchronization issue caused by the group_broadcast api inside this kernel. Calling such api without using the whole warp cause the program to hang. Changing the group size matching NVIDIA architecures and tuned configuration fixes the issue for all mentioned operators. Re-enable tests. Signed-off-by: nscipione <[email protected]>

Add definition to compilation when using DEFAULT as TUNING_TARGET but targetting NVIDIA GPUs. Use this new definition to select the correct template implementation for trsv/tbsv/ptsv kernels.

The fixes provided for iamax, iamin, tbsv, tpsv and trsv for NVIDIA target architectural feature in common between the two GPUs. So implementing the same changes also for AMD GPUs target fix failing tests. Signed-off-by: nscipione <[email protected]>

Instead of using pragma to select which kernel launch in default mode, switch to runtime check of device in use

Co-authored-by: HJA Bird <[email protected]>

Add exception to properly handle not supported combination of hardware and operator. Add `cl` namespace to keep compatibility with adaptiveCpp in the current implementation.

src/interface/blas2/backend/default.hpp

Moving retrieve of vendor info inside proper if-statement for tbsv and tpsv.

s-Nick force-pushed the txsv_iamax_iamin_default_fixes branch 2 times, most recently from ff25df6 to 6452589 Compare May 10, 2024 09:10

s-Nick marked this pull request as ready for review May 13, 2024 09:08

hjabird reviewed May 13, 2024

View reviewed changes

Rbiessy reviewed May 14, 2024

View reviewed changes

src/operations/blas1/IndexMaxMin.hpp Show resolved Hide resolved

s-Nick and others added 10 commits May 15, 2024 10:04

Fix iamax/iamin operators for default configuration on NVIDIA GPUs

10ad1f6

The current implementation fails due to a wrong usage of the shift_group_left api. This patch fix it and re-enable the tests.

Fix solvers implementation

c4165ad

Add definition to compilation when using DEFAULT as TUNING_TARGET but targetting NVIDIA GPUs. Use this new definition to select the correct template implementation for trsv/tbsv/ptsv kernels.

Apply changes also to AMD GPUs

7d9e887

The fixes provided for iamax, iamin, tbsv, tpsv and trsv for NVIDIA target architectural feature in common between the two GPUs. So implementing the same changes also for AMD GPUs target fix failing tests. Signed-off-by: nscipione <[email protected]>

Change from compile pragma to runtime checks

755d9bd

Instead of using pragma to select which kernel launch in default mode, switch to runtime check of device in use

Remove unused definition introduced before

ab4200f

Fix typo

b13aea1

Update test warning for disabled tests

dbc5002

Update src/interface/blas2/backend/default.hpp

6a91a46

Co-authored-by: HJA Bird <[email protected]>

Address PR comments

a626b95

Add exception to properly handle not supported combination of hardware and operator. Add `cl` namespace to keep compatibility with adaptiveCpp in the current implementation.

s-Nick force-pushed the txsv_iamax_iamin_default_fixes branch from a779a44 to a626b95 Compare May 15, 2024 09:04

Rbiessy approved these changes May 15, 2024

View reviewed changes

hjabird reviewed May 17, 2024

View reviewed changes

src/interface/blas2/backend/default.hpp Outdated Show resolved Hide resolved

src/interface/blas2/backend/default.hpp Outdated Show resolved Hide resolved

Address PR comments

98f5314

Moving retrieve of vendor info inside proper if-statement for tbsv and tpsv.

hjabird approved these changes May 20, 2024

View reviewed changes

s-Nick merged commit c6d3cad into codeplaysoftware:master May 20, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix for DEFAULT TUNING_TARGET on AMD and NVIDIA GPUs #517

fix for DEFAULT TUNING_TARGET on AMD and NVIDIA GPUs #517

s-Nick commented May 8, 2024

fix for DEFAULT TUNING_TARGET on AMD and NVIDIA GPUs #517

fix for DEFAULT TUNING_TARGET on AMD and NVIDIA GPUs #517

Conversation

s-Nick commented May 8, 2024