This repository has been archived by the owner on Jan 13, 2025. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 50
fix for DEFAULT TUNING_TARGET on AMD and NVIDIA GPUs #517
Merged
s-Nick
merged 11 commits into
codeplaysoftware:master
from
s-Nick:txsv_iamax_iamin_default_fixes
May 20, 2024
Merged
fix for DEFAULT TUNING_TARGET on AMD and NVIDIA GPUs #517
s-Nick
merged 11 commits into
codeplaysoftware:master
from
s-Nick:txsv_iamax_iamin_default_fixes
May 20, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
s-Nick
force-pushed
the
txsv_iamax_iamin_default_fixes
branch
2 times, most recently
from
May 10, 2024 09:10
ff25df6
to
6452589
Compare
hjabird
reviewed
May 13, 2024
Rbiessy
reviewed
May 14, 2024
The current implementation fails due to a wrong usage of the shift_group_left api. This patch fix it and re-enable the tests.
This patch fixes a synchronization issue caused by the group_broadcast api inside this kernel. Calling such api without using the whole warp cause the program to hang. Changing the group size matching NVIDIA architecures and tuned configuration fixes the issue for all mentioned operators. Re-enable tests. Signed-off-by: nscipione <[email protected]>
Add definition to compilation when using DEFAULT as TUNING_TARGET but targetting NVIDIA GPUs. Use this new definition to select the correct template implementation for trsv/tbsv/ptsv kernels.
The fixes provided for iamax, iamin, tbsv, tpsv and trsv for NVIDIA target architectural feature in common between the two GPUs. So implementing the same changes also for AMD GPUs target fix failing tests. Signed-off-by: nscipione <[email protected]>
Instead of using pragma to select which kernel launch in default mode, switch to runtime check of device in use
Co-authored-by: HJA Bird <[email protected]>
Add exception to properly handle not supported combination of hardware and operator. Add `cl` namespace to keep compatibility with adaptiveCpp in the current implementation.
s-Nick
force-pushed
the
txsv_iamax_iamin_default_fixes
branch
from
May 15, 2024 09:04
a779a44
to
a626b95
Compare
Rbiessy
approved these changes
May 15, 2024
hjabird
reviewed
May 17, 2024
Moving retrieve of vendor info inside proper if-statement for tbsv and tpsv.
hjabird
approved these changes
May 20, 2024
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR fixes most of the tests that fails on AMD and NVIDIA GPUs using DEFAULT configuration.
It fixes all of them for AMD and let only
trsm
operator to be fixed for NVIDIA.In particular it fixes:
iamax/iamin:
The
sycl:shift_group_left
api requires all group(sub_group) takes part to the operation, removing the if-condition solves the problem.txsv operators:
broadcast operations inside the kernel require a specific size of group and subgroup, so calling the kernel implementation from
default
is not enough due to hardware differences. This solution uses runtime checks to select the correct template parameters. This leads to compile more kernels than before but from my tests it doesn't affect significantly compilation time.