Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support enzyme in KA #2260

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ DataFrames = "1"
ExprTools = "0.1"
GPUArrays = "10.0.1"
GPUCompiler = "0.24, 0.25"
KernelAbstractions = "0.9.2"
KernelAbstractions = "0.9.17"
LLVM = "6"
LLVMLoopInfo = "1"
LazyArtifacts = "1"
Expand Down
6 changes: 6 additions & 0 deletions src/CUDAKernels.jl
Original file line number Diff line number Diff line change
Expand Up @@ -243,4 +243,10 @@ function KA.priority!(::CUDABackend, prio::Symbol)
return nothing
end

KA.supports_enzyme(::CUDABackend) = true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems weird/wrong to add an Enzyme-specific API to the KA.jl interface, while the Enzyme support is all in an extension packages.

function KA.__fake_compiler_job(::CUDABackend)
mi = CUDA.methodinstance(typeof(()->return), Tuple{})
return CUDA.CompilerJob(mi, CUDA.compiler_config(CUDA.device()))
end
Comment on lines +247 to +250
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very sketchy...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the core challenge we have is that we need to allocate memory of a certain type for the tape. This memory allocation needs to occur on the outside and then be passed into the kernel.

What we came up with was a reflection function tape_type that given a compilation job returns the element type of the array to allocate.

The crux is that we can't use the host job for the reflection since in the end this is a deferred compilation. This requires us taking into account the CUDA method table.

So for this reflection we need the parent job. Not the real one as long as the method table matches, also the parent kernel will take the allocated array as an argument so we can't even construct it yet.

Copy link
Member Author

@vchuravy vchuravy Mar 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of the challenge here is that only the backend packages know how to construct an appropriate job.

We also need to do something similar to support reverse mode for CUDA.jl directly.


end