-
Notifications
You must be signed in to change notification settings - Fork 226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for Hopper (H100, GH200) GPUs #1846
Comments
Hopper should work fine on current toolchains, the missing LLVM support only prevents us from using its specific features (which we don't have wrappers for anyway). Or are you running into specific issues? |
I just saw your Discourse post, https://discourse.julialang.org/t/sm90-h100-support-for-cuda-jl/96809. That suggests there is an compatibility issue; you should have included that in your issue 🙂 |
Thanks for linking it for me 🙂 I'm currently working on building Julia with a custom flavor of LLVM to see if that solves the issue. |
So based on how many breaking changes there are from 14->15, I'm assuming it will be a lot of work to jump from 14->16... |
Be sure to disable opaque pointers; that isn't supported by the GPU stack yet. LLVM 15 should work with LLVM.jl 5 which CUDA.jl will support later today (I'm working on a PR). |
But doesn't the LLVM 15 PR in |
Yes, and we'll cross that bridge when we get there. We just added the necessary bits to LLVM.jl (maleadt/LLVM.jl#326) and updated APIs to be compatible with the opaque pointer world (maleadt/LLVM.jl#340), but we still need to make some updates to the code in GPUCompiler and CUDA.jl. With JuliaLang/julia#49128 though, it should be possible to both upgrade to LLVM 15 and not enable opaque pointers, either by disabling it using a command-line flag ( |
How is this progressing. I can confirm that there is still an issue with H100:
|
No progress yet. I guess we'll have to backport llvm/llvm-project@9a01cca, which will make it possible to generate code targeting Or, we do something hacky and just bump the |
Can you try #1931? |
@maleadt are there no llvm v16 specific features required to backport? Looking through that diff, it looks like they just added |
As long as we don't rely on them, I don't think it's likely that we need other changes. |
Finally got to test on an H100, and things generally work now. The only exception is sorting with the quicksort algorithm, because we are using the legacy dynamic parallelism API which is unsupported on Hopper. |
i have access to a GH200 (and H100) if you need help debugging. would like to see this work! |
Just to be clear, 99% of CUDA.jl works perfectly fine on Hopper, only dynamic parallelism (as needed by |
In order to support Hopper (H100) GPUs, then the Julia toolchain needs to also support LLVM v16. Currently, the latest pre-release (1.9) is building with LLVM v14.
One could always build Julia themselves using LLVM v16 (although this is considered "experimental"). It would be nice to raise this issue with the larger Julia dev community sooner than later, so that this step isn't needed.
The text was updated successfully, but these errors were encountered: