Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support enzyme in KA #2260

Closed
wants to merge 1 commit into from
Closed

Support enzyme in KA #2260

wants to merge 1 commit into from

Conversation

vchuravy
Copy link
Member

@vchuravy vchuravy commented Feb 7, 2024

No description provided.

@michel2323
Copy link

michel2323 commented Feb 27, 2024

@vchuravy We'll have to merge this to merge JuliaGPU/KernelAbstractions.jl#454 .

@@ -243,4 +243,10 @@ function KA.priority!(::CUDABackend, prio::Symbol)
return nothing
end

KA.supports_enzyme(::CUDABackend) = true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems weird/wrong to add an Enzyme-specific API to the KA.jl interface, while the Enzyme support is all in an extension packages.

Comment on lines +247 to +250
function KA.__fake_compiler_job(::CUDABackend)
mi = CUDA.methodinstance(typeof(()->return), Tuple{})
return CUDA.CompilerJob(mi, CUDA.compiler_config(CUDA.device()))
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very sketchy...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the core challenge we have is that we need to allocate memory of a certain type for the tape. This memory allocation needs to occur on the outside and then be passed into the kernel.

What we came up with was a reflection function tape_type that given a compilation job returns the element type of the array to allocate.

The crux is that we can't use the host job for the reflection since in the end this is a deferred compilation. This requires us taking into account the CUDA method table.

So for this reflection we need the parent job. Not the real one as long as the method table matches, also the parent kernel will take the allocated array as an argument so we can't even construct it yet.

Copy link
Member Author

@vchuravy vchuravy Mar 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of the challenge here is that only the backend packages know how to construct an appropriate job.

We also need to do something similar to support reverse mode for CUDA.jl directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants