Skip to content
This repository has been archived by the owner on Jan 13, 2025. It is now read-only.

fix for DEFAULT TUNING_TARGET on AMD and NVIDIA GPUs #517

Merged
merged 11 commits into from
May 20, 2024

Conversation

s-Nick
Copy link
Collaborator

@s-Nick s-Nick commented May 8, 2024

This PR fixes most of the tests that fails on AMD and NVIDIA GPUs using DEFAULT configuration.
It fixes all of them for AMD and let only trsm operator to be fixed for NVIDIA.

In particular it fixes:

  • iamax
  • iamin
  • trsv
  • tbsv
  • tpsv

iamax/iamin:
The sycl:shift_group_left api requires all group(sub_group) takes part to the operation, removing the if-condition solves the problem.

txsv operators:
broadcast operations inside the kernel require a specific size of group and subgroup, so calling the kernel implementation from default is not enough due to hardware differences. This solution uses runtime checks to select the correct template parameters. This leads to compile more kernels than before but from my tests it doesn't affect significantly compilation time.

@s-Nick s-Nick force-pushed the txsv_iamax_iamin_default_fixes branch 2 times, most recently from ff25df6 to 6452589 Compare May 10, 2024 09:10
@s-Nick s-Nick marked this pull request as ready for review May 13, 2024 09:08
src/interface/blas2/backend/default.hpp Outdated Show resolved Hide resolved
src/interface/blas2/backend/default.hpp Show resolved Hide resolved
src/interface/blas2/backend/default.hpp Outdated Show resolved Hide resolved
src/interface/blas2/backend/default.hpp Outdated Show resolved Hide resolved
src/interface/blas2/backend/default.hpp Show resolved Hide resolved
src/operations/blas1/IndexMaxMin.hpp Show resolved Hide resolved
s-Nick and others added 10 commits May 15, 2024 10:04
The current implementation fails due to a wrong usage of the
shift_group_left api. This patch fix it and re-enable the tests.
This patch fixes a synchronization issue caused by the group_broadcast
api inside this kernel. Calling such api without using the whole warp
cause the program to hang. Changing the group size matching NVIDIA
architecures and tuned configuration fixes the issue for all mentioned
operators.
Re-enable tests.

Signed-off-by: nscipione <[email protected]>
Add definition to compilation when using DEFAULT as TUNING_TARGET but
targetting NVIDIA GPUs. Use this new definition to select the correct
template implementation for trsv/tbsv/ptsv kernels.
The fixes provided for iamax, iamin, tbsv, tpsv and trsv for NVIDIA
target architectural feature in common between the two GPUs. So
implementing the same changes also for AMD GPUs target fix failing
tests.

Signed-off-by: nscipione <[email protected]>
Instead of using pragma to select which kernel launch in default mode,
switch to runtime check of device in use
Add exception to properly handle not supported combination of hardware
and operator.
Add `cl` namespace to keep compatibility with adaptiveCpp in the current
implementation.
@s-Nick s-Nick force-pushed the txsv_iamax_iamin_default_fixes branch from a779a44 to a626b95 Compare May 15, 2024 09:04
src/interface/blas2/backend/default.hpp Outdated Show resolved Hide resolved
src/interface/blas2/backend/default.hpp Outdated Show resolved Hide resolved
Moving retrieve of vendor info inside proper if-statement for tbsv and
tpsv.
@s-Nick s-Nick merged commit c6d3cad into codeplaysoftware:master May 20, 2024
3 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants