[RFC] Adding AMD GPU support via HIP/ROCM #7838

GMNGeoffrey · 2024-11-26T18:14:40Z

We are interested in adding support for DGL to run on AMD GPUs. This was previously requested by users in #2659 and on the forum somewhat recently and a while ago.

We can make use of the hipify tooling. I've already created a prototype that passes almost all the C++ unit tests (there's some missing functionality in HIP/ROCM upstream that will need to be addressed) and runs through all the blitz tutorials. I have only experimented with the PyTorch backend. PyTorch already calls all GPUs "cuda", so existing torch python code doesn't need to be modified. Within DGL, the prototype follows this same pattern of overloading the cuda types.

Structure

The prototype just converts the code in-place, modifying it to use HIP instead of CUDA. Presumably, you don't want to do that. So options would be:

the HIP version lives on a separate branch, constructed by "hipifying" the default branch
the HIP version is checked-in to a parallel directory structure (mostly generated by hipifying the main source code)
the HIP version is generated dynamically by a build script (this is what PyTorch does).
HIP is added as a fully-supported device type separate from CUDA. This would be a lot more work, I think, but could be a good option long term if this gains traction.

Barring strong reasons to the contrary, I think following PyTorch's example (3) probably makes the most sense.

In addition to threading through the appropriate build options, the prototype makes a few changes to the source code prior to hipification. I think they are (or can be made to be) relatively unobjectionable, or at least hidden behind macros so they can't affect the normal build. If those changes aren't acceptable though, then it makes structure option 3 above trickier. It's also possible that achieving high performance (as opposed to just correctness) on AMD GPUs would require more invasive modifications. I think it's probably best to address those as they come up, but want to acknowledge that this isn't zero-cost from a maintainability perspective and might end up creating conflicting pressures.

github-actions · 2025-01-02T01:33:48Z

This issue has been automatically marked as stale due to lack of activity. It will be closed if no further activity occurs. Thank you

github-actions bot added the stale-issue label Jan 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Adding AMD GPU support via HIP/ROCM #7838

[RFC] Adding AMD GPU support via HIP/ROCM #7838

GMNGeoffrey commented Nov 26, 2024 •

edited

Loading

github-actions bot commented Jan 2, 2025

[RFC] Adding AMD GPU support via HIP/ROCM #7838

[RFC] Adding AMD GPU support via HIP/ROCM #7838

Comments

GMNGeoffrey commented Nov 26, 2024 • edited Loading

Structure

github-actions bot commented Jan 2, 2025

GMNGeoffrey commented Nov 26, 2024 •

edited

Loading