-
Notifications
You must be signed in to change notification settings - Fork 423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve ROCm 5 support #23480
Comments
Also note that we currently error (in Chapel) when users use atomicMax/Min with 64-bit signed integers as this wasn't supported by HIP until ROCM 5.7 (ROCm/clr#2). Once we're able to support 5.7 and on, we'll need to decide which of these we'll want to do:
I'm leaning towards just adding our own check and only erroring when using an incompatible version. |
I am wrestling with some ROCm version issues, and wanted to record the following table to summarize ROCm versions and which Clang version they correspond to. The list is not exhaustive and contains ROCm installations that I have easy access to.
|
Resolves Cray/chapel-private#5509 Since the beginning of AMD support, we have been using `clang-offload-bundler` from the ROCm installation instead of from our backend. This is probably based on a decision in the prototyping phase that we haven't revisited or didn't feel the need to. Recently, we've observed that this resulted in some deprecation warnings because the flags we use while launching `clang-offload-bundler`: #23480. Sticking to ROCm's bundler, we'd have to do some wiring to be able to determine the version so that we can use the right arguments. We already have that wiring laid out for the backend's clang version. So, with this PR, we'll start using backend's bundler. On a more abstract level, it also makes sense to prefer that one as we're generating the binary that the bundler works with using our backend and not ROCm's clang. This allows choosing the right arguments for passing inputs and outputs to and from `clang-offload-bundler`. While there, this PR also adjusts how we generate the `targets` argument to make the implementation more consistent. [Reviewed by @DanilaFe] Test: - [x] amd with rocm 4.4 - [x] spot-check amd with rocm 5.2, 5.3, 5.4 - [x] spot-check nvidia
On main today we now support ROCm 6.0-6.2 and it seems stable, so the need for expanding our support for ROCm 5 beyond 5.4 is lessened. We do still currently support ROCm 4, but it will likely be removed soon. Because of these reasons I am closing this issue |
... and consider deprecating ROCm 4.x support.
#23461 added some patches to be able to run with ROCm <5.5. But there are some hiccups with ROCm 5.x support. This issue captures them and discusses the paths forward:
Versions 5.0 and 5.1
Versions 5.2, 5.3 and 5.4
Deprecation warnings like the following are expected:
Resolving these would take:
inputs
we have so that we don’t pass comma-separated list toinput
, but instead passing multipleinput
args, and more importantlyNeither is too difficult, and we should address soon after 1.32 release. Note that (2) may depend on the future of our 4.x support.
Versions 5.5 and on
We saw
error: linking module flags 'PIC Level': IDs have conflicting behaviors in '/.../rocm/amdgcn/bitcode/hip.bc' and 'root'
and blocked using 5.5 and on in our environment scripts today.A potential solution is to see if we can
setPICLevel
of the GPU module to match those that were compiled with clang 16 (starting ROCm 5.5,hipcc
et al. is clang 16-based).The ideal solution is to wait until upgrading to LLVM 16 (tentatively scheduled for 1.33), which I hope will make the issue go away.
How long should we keep supporting 4.x?
ROCm Release History starts with 5.0. We have an old testing system with MI60s that doesn't have ROCm 5.0. Our dedicated cluster has ROCm 4.4, but we can install a newer version on it. Frontier and an internal EX system don't have any 4.x modules installed. We should look/ask around some more. If any AMD GPU user has any input here, we'd appreciate it.
The text was updated successfully, but these errors were encountered: