-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add HIP backend #2008
Add HIP backend #2008
Conversation
This is necessary when targeting C++ (as with HIP) as it does not allow us to 'goto' past variable declarations - which we do when handling bounds checks. This could also be implemented in GenericC itself, but it is slightly simpler this way.
The entire test and benchmark suites now work with the HIP backend. (The CI errors are due to some of the NVIDIA GPUs in the cluster being unavailable, unrelated to the backend.) The performance compared to the OpenCL backend, on the MI100 GPU, is as follows:
Some things are substantially faster (e.g. FFT). I think this is because the HIP backend allows up to 1024 threads in a thread block (compared to only 256 for AMD's OpenCL implementation), which allows intragroup parallelism to apply. Some are strangely slower (e.g. mandelbrot). I'll have to look into it. It may be something simple like not properly querying for how many threads to launch. Everything that depends on scans is faster, as the HIP backend uses the highly tuned single pass scans code generation. I also see that e.g. sgemm is a lot slower than with OpenCL. This also merits further investigation. But overall, this backend looks pretty operational to me. Certainly worth using for some programs, and with some tweaks we can probably make it superior to the OpenCL backend in all cases, on AMD hardware. |
No description provided.