Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Adding AMD GPU support via HIP/ROCM #7838

Open
GMNGeoffrey opened this issue Nov 26, 2024 · 0 comments
Open

[RFC] Adding AMD GPU support via HIP/ROCM #7838

GMNGeoffrey opened this issue Nov 26, 2024 · 0 comments

Comments

@GMNGeoffrey
Copy link

We are interested in adding support for DGL to run on AMD GPUs. This was previously requested by users in #2659. We can make use of the hipify tooling.

I've already created a prototype that passes almost all the C++ unit tests (there's some missing functionality in HIP/ROCM upstream that will need to be addressed) and runs through all the blitz tutorials. I have only experimented with the PyTorch backend. PyTorch already calls all GPUs "cuda", so existing torch python code doesn't need to be modified. Within DGL, the prototype follows this same pattern of overloading the cuda types.

Structure

The prototype just converts the code in-place, modifying it to use HIP instead of CUDA. Presumably, you don't want to do that. So options would be:

  1. the HIP version lives on a separate branch, constructed by "hipifying" the default branch
  2. the HIP version is checked-in to a parallel directory structure (mostly generated by hipifying the main source code)
  3. the HIP version is generated dynamically by a build script (this is what PyTorch does).
  4. HIP is added as a fully-supported device type separate from CUDA. This would be a lot more work, I think, but could be a good option long term if this gains traction.

Barring strong reasons to the contrary, I think following PyTorch's example (3) probably makes the most sense.

In addition to threading through the appropriate build options, the prototype makes a few changes to the source code prior to hipification. I think they are (or can be made to be) relatively unobjectionable, or at least hidden behind macros so they can't affect the normal build. If those changes aren't acceptable though, then it makes structure option 3 above trickier. It's also possible that achieving high performance (as opposed to just correctness) on AMD GPUs would require more invasive modifications. I think it's probably best to address those as they come up, but want to acknowledge that this isn't zero-cost from a maintainability perspective and might end up creating conflicting pressures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant