-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support enzyme in KA #2260
Support enzyme in KA #2260
Conversation
@vchuravy We'll have to merge this to merge JuliaGPU/KernelAbstractions.jl#454 . |
@@ -243,4 +243,10 @@ function KA.priority!(::CUDABackend, prio::Symbol) | |||
return nothing | |||
end | |||
|
|||
KA.supports_enzyme(::CUDABackend) = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems weird/wrong to add an Enzyme-specific API to the KA.jl interface, while the Enzyme support is all in an extension packages.
function KA.__fake_compiler_job(::CUDABackend) | ||
mi = CUDA.methodinstance(typeof(()->return), Tuple{}) | ||
return CUDA.CompilerJob(mi, CUDA.compiler_config(CUDA.device())) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very sketchy...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the core challenge we have is that we need to allocate memory of a certain type for the tape. This memory allocation needs to occur on the outside and then be passed into the kernel.
What we came up with was a reflection function tape_type
that given a compilation job returns the element type of the array to allocate.
The crux is that we can't use the host job for the reflection since in the end this is a deferred compilation. This requires us taking into account the CUDA method table.
So for this reflection we need the parent job. Not the real one as long as the method table matches, also the parent kernel will take the allocated array as an argument so we can't even construct it yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Part of the challenge here is that only the backend packages know how to construct an appropriate job.
We also need to do something similar to support reverse mode for CUDA.jl directly.
No description provided.