-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GPU reverse mode to EnzymeExt #454
Conversation
|
ext/EnzymeExt.jl
Outdated
@show TapeType | ||
|
||
|
||
subtape = Array{TapeType}(undef, size(blocks(iterspace))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be Array, but it should be the device type specific to the array.
Is there a way to get that @vchuravy @michel2323
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
KernelAbstractions.allocate
Project.toml
Outdated
@@ -6,6 +6,9 @@ version = "0.9.14" | |||
[deps] | |||
Adapt = "79e6a3ab-5dfb-504d-930d-738a2a938a0e" | |||
Atomix = "a9b6321e-bd34-4604-b9c9-b65b8de01458" | |||
CUDA = "052768ef-5323-5732-b1bb-66c8b64840ba" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CUDA = "052768ef-5323-5732-b1bb-66c8b64840ba" |
d6c5a72
to
07d52b7
Compare
@wsmoses Forward mode doesn't work anymore, which used to work when I started on this. I'm on latest (KernelAbstractions) pkg> st
Project KernelAbstractions v0.9.15
Status `~/.julia/dev/KernelAbstractions/Project.toml`
[79e6a3ab] Adapt v4.0.1
[a9b6321e] Atomix v0.1.0
[052768ef] CUDA v5.2.0
[7da242da] Enzyme v0.11.13 `~/.julia/dev/Enzyme`
[1914dd2f] MacroTools v0.5.13
[aea7be01] PrecompileTools v1.2.0
[ae029012] Requires v1.3.0
[90137ffa] StaticArrays v1.9.1
[013be700] UnsafeAtomics v0.2.1
[d80eeb9a] UnsafeAtomicsLLVM v0.1.3
[7cc45869] Enzyme_jll v0.0.98+0 `../Enzyme_jll`
[b77e0a4c] InteractiveUtils
[37e2e46d] LinearAlgebra
[2f01184e] SparseArrays v1.10.0
[cf7118a7] UUIDs |
8221520
to
e584ed6
Compare
ext/CUDAEnzymeExt.jl
Outdated
@show TapeType | ||
|
||
|
||
subtape = Array{TapeType}(undef, size(blocks(iterspace))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@michel2323 can you have this use the device-specific allocator like @vchuravy mentioned above?
test/reverse_gpu.jl
Outdated
function mul_caller(A, B, backend) | ||
kernel = mul!(backend) | ||
kernel(A, B, ndrange=size(A)) | ||
KernelAbstractions.synchronize(backend) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vchuravy is there a KA way to have the kernel run immediately rather than get sync'd. That way this can be separately merged from adding support for KA async
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what you mean by that? CUDA etc are all async, so the synchronized is needed in any case. You can leave it out and rely on stream ordering?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok in which case we need to add KA sync rules for forward and reverse. Forward is easy, just sync the regular and shadow kernels. Reverse needs to launch the reverse kernel. and reverse kernel execute needs to sync it
4a20dfd
to
3c38fc7
Compare
I added the following allocate call: subtape = allocate(CUDABackend(), TapeType, size(blocks(iterspace))) Now with
With the latest
|
you should update Enzyme to latest (0.11.14) |
1ca1e3b
to
d216c5c
Compare
The reverse kernel uses
|
60112d6
to
e92c3b8
Compare
@vchuravy Cleaned up. Are we waiting for EnzymeAD/Enzyme.jl#1104 and JuliaGPU/CUDA.jl#2260 ? |
Will need to change KernelAbstractions.jl/Project.toml Line 23 in c5fe83c
0.7.1
|
@michel2323 given that the prerequisites have landed, mind getting this over the finish line? |
ede2f76
to
c21f6bb
Compare
@wsmoses @vchuravy Cleanup with working tests (if CUDA is working). Last unresolved issue is active arguments to a kernel. The compiler cannot figure out the type here for the actives, so all actives are marked KernelAbstractions.jl/ext/EnzymeExt.jl Line 334 in c21f6bb
I tried to fix it, but I'm not sure there's a way. So for now, it gracefully errors with KernelAbstractions.jl/ext/EnzymeExt.jl Line 259 in c21f6bb
|
Will need rebase for #478 |
eb9b999
to
29d4580
Compare
Fix examples tests with CUDA backend Add synchronize rule
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me but @vchuravy should do the proper approval/merge
@vchuravy Bump. Is there a blocker here? |
The tests are a bit sparse and they should be enabled for more than the CPU backend? |
@michel2323 's PR, but opening so we can have a place to discuss.