Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace volatile by proper memory ordering in HIP #1472

Merged
merged 5 commits into from
Nov 30, 2023
Merged

Conversation

upsj
Copy link
Member

@upsj upsj commented Nov 25, 2023

rocm-clang supports the GCC __atomic intrinsics, which we can use to implement the atomic operations instead of using volatile and memory fences.

This is a follow-up to #1344, so I requested reviews from the same reviewers

@upsj upsj added the 1:ST:ready-for-review This PR is ready for review label Nov 25, 2023
@upsj upsj requested review from thoasm and yhmtsai November 25, 2023 15:20
@upsj upsj self-assigned this Nov 25, 2023
@ginkgo-bot ginkgo-bot added mod:cuda This is related to the CUDA module. mod:hip This is related to the HIP module. reg:helper-scripts This issue/PR is related to the helper scripts mainly concerned with development of Ginkgo. labels Nov 25, 2023
@upsj upsj requested a review from MarcelKoch November 25, 2023 15:21
Copy link
Member

@yhmtsai yhmtsai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you have the documentation about rocm-clang support gcc atomic on GPU?
Is there only one compiler rocm-clang from hipcc on AMD GPU?

dev_tools/scripts/generate_cuda_memory_ptx.py Outdated Show resolved Hide resolved
@upsj
Copy link
Member Author

upsj commented Nov 27, 2023

This is based on communications with an AMD engineer, and the fact that clang has to be able to compile libstdc++, which relies on these intrinsics. We are not using any other compiler, and I'm not aware of any that we should be looking at. But thanks for the hint, I looked at what rocSPARSE is doing, and they are using __hip_atomic_load, which gives us the same kind of control over scope and ordering, but isn't documented, so I'm not sure we can rely on it too much

@upsj
Copy link
Member Author

upsj commented Nov 28, 2023

Looks like the intrinsics are not supported by older HIP versions yet. I'll fall back on the GCC versions then

@upsj upsj added the 1:ST:no-changelog-entry Skip the wiki check for changelog update label Nov 28, 2023
@upsj upsj requested a review from yhmtsai November 29, 2023 07:18
Copy link
Member

@yhmtsai yhmtsai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the code self, LGTM.
I am still not sure whether gcc intrinsics work on GPU address. compiling to libstdc++ only mean for CPU side not GPU, right?

hip/components/memory.hip.hpp Outdated Show resolved Hide resolved
}


template <typename ValueType>
__device__ __forceinline__ ValueType load_relaxed_shared(const ValueType* ptr)
__device__ __forceinline__ thrust::complex<ValueType> load_relaxed(
Copy link
Member

@yhmtsai yhmtsai Nov 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe only for thrust::complex<double>?
thrust::complex<float> can be done with int64

Copy link
Member Author

@upsj upsj Nov 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow, we only use thrust::complex with float value types Formatting issues with templates. We don't need full atomicity with thrust::complex, only element-wise, so this allows a more efficient code generation. Maybe the compiler even combines them? Doesn't matter much

@@ -6,6 +6,7 @@
#define GKO_HIP_COMPONENTS_MEMORY_HIP_HPP_


#include <cstring>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it used?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

memcpy is defined in string.h, not sure if this is actually necessary though

Copy link
Member

@MarcelKoch MarcelKoch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs quite a bit of documentation, since this relies on functionality that is not in the official documentation. It needs at least links to where you found the__hip_atomic_load|store, and the different scopes. If the references themselves don't have much documentation, then also properly documenting them would be necessary.

hip/components/memory.hip.hpp Outdated Show resolved Hide resolved
hip/components/memory.hip.hpp Outdated Show resolved Hide resolved
cuda/components/memory.cuh Show resolved Hide resolved
@upsj
Copy link
Member Author

upsj commented Nov 29, 2023

@yhmtsai The tests run fine, and this was a suggestion by an AMD engineer, so I'm confident we can use them. If the intrinsics weren't supported, it would fail to compile instead.

Copy link

sonarcloud bot commented Nov 30, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

warning The version of Java (11.0.3) you have used to run this analysis is deprecated and we will stop accepting it soon. Please update to at least Java 17.
Read more here

@upsj upsj added 1:ST:ready-to-merge This PR is ready to merge. and removed 1:ST:ready-for-review This PR is ready for review 1:ST:run-full-test labels Nov 30, 2023
@upsj upsj merged commit f2e0449 into develop Nov 30, 2023
12 of 15 checks passed
@upsj upsj deleted the hip_memory_order branch November 30, 2023 14:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1:ST:no-changelog-entry Skip the wiki check for changelog update 1:ST:ready-to-merge This PR is ready to merge. mod:cuda This is related to the CUDA module. mod:hip This is related to the HIP module. reg:helper-scripts This issue/PR is related to the helper scripts mainly concerned with development of Ginkgo.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants