-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial work on CUDA-compat #25
base: main
Are you sure you want to change the base?
Conversation
I think now the CUDAext works properly, current tests about cuda all passes. The following code runs properly: using CUDA
using LinearAlgebra
using Distributions, Random
using Bijectors
using NormalizingFlows
rng = CUDA.default_rng()
T = Float32
q0_g = MvNormal(CUDA.zeros(T, 2), I)
CUDA.functional()
ts_g = gpu(ts)
flow_g = transformed(q0_g, ts_g)
x = rand(rng, q0_g) # good However, there is still issue to fix---sample multiple samples at once, and sample from
xs = rand(rng, q0_g, 10) # ambiguous error message: ERROR: MethodError: rand(::CUDA.RNG, ::MvNormal{Float32, PDMats.ScalMat{Float32}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, ::Int64) is ambiguous.
Candidates:
rand(rng::Random.AbstractRNG, s::Sampleable{Multivariate, Continuous}, n::Int64)
@ Distributions ~/.julia/packages/Distributions/Ufrz2/src/multivariates.jl:23
rand(rng::Random.AbstractRNG, s::Sampleable{Multivariate}, n::Int64)
@ Distributions ~/.julia/packages/Distributions/Ufrz2/src/multivariates.jl:21
rand(rng::CUDA.RNG, s::Sampleable{<:ArrayLikeVariate, Continuous}, n::Int64)
@ NormalizingFlowsCUDAExt ~/Research/Turing/NormalizingFlows.jl/ext/NormalizingFlowsCUDAExt.jl:16
Possible fix, define
rand(::CUDA.RNG, ::Sampleable{Multivariate, Continuous}, ::Int64)
Stacktrace:
[1] top-level scope
@ ~/Research/Turing/NormalizingFlows.jl/example/test.jl:42
y = rand(rng, flow_g) # ambiguous err meesage: ERROR: MethodError: rand(::CUDA.RNG, ::MultivariateTransformed{MvNormal{Float32, PDMats.ScalMat{Float32}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, ComposedFunction{PlanarLayer{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, PlanarLayer{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}}) is ambiguous.
Candidates:
rand(rng::Random.AbstractRNG, td::MultivariateTransformed)
@ Bijectors ~/.julia/packages/Bijectors/cvMxj/src/transformed_distribution.jl:160
rand(rng::CUDA.RNG, s::Sampleable{<:ArrayLikeVariate, Continuous})
@ NormalizingFlowsCUDAExt ~/Research/Turing/NormalizingFlows.jl/ext/NormalizingFlowsCUDAExt.jl:7
Possible fix, define
rand(::CUDA.RNG, ::MultivariateTransformed)
Stacktrace:
[1] top-level scope
@ ~/Research/Turing/NormalizingFlows.jl/example/test.jl:40 This is partially because we are overloading methods and types that do not own by this pkg. |
I don't have a immediate solution other than the suggested fixes. |
Yeah, I agree. For temporary solution, I'm thinking adding an additional argument for |
ext/NormalizingFlowsCUDAExt.jl
Outdated
|
||
function Distributions._rand!(rng::CUDA.RNG, d::Distributions.MvNormal, x::CuVecOrMat) | ||
# Replaced usage of scalar indexing. | ||
CUDA.randn!(rng, x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zuhengxu do you know why this change of yours was necessary? I thought Random.randn!(rng, x)
should just dispatch to CUDA.randn!(rng, x)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh, you are right---this is not necessary. I think I just made the change to ensure it's actually calling the cuda sampling. I can change it back.
ext/NormalizingFlowsCUDAExt.jl
Outdated
function Distributions.rand( | ||
rng::CUDA.RNG, | ||
s::Distributions.Sampleable{<:Distributions.ArrayLikeVariate,Distributions.Continuous}, | ||
n::Int, | ||
) | ||
return @inbounds Distributions.rand!( | ||
rng, Distributions.sampler(s), CuArray{float(eltype(s))}(undef, length(s), n) | ||
) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usage of length
here will cause some issues, e.g. what if s
is wrapping a matrix distribution?
Maybe (undef, size(s)..., n)
will do? But I don't quite recall what is the correct size here; should be somewhere in the Distributions.jl docs.
example/Project.toml
Outdated
@@ -15,3 +15,4 @@ Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80" | |||
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c" | |||
Revise = "295af30f-e4ad-537b-8983-00126c2a3abe" | |||
Zygote = "e88e6eb3-aa80-5325-afca-941959d7151f" | |||
cuDNN = "02a925ec-e4fe-4b08-9a7e-0d78e3d38ccd" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed now (after you removed the test-file you were using)?
cuDNN = "02a925ec-e4fe-4b08-9a7e-0d78e3d38ccd" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is needed if we want some of the Flux.jl chain to run properly on GPU. But you are right, it's not used for the current examples---they are all runing on CPUs. I'll remove it later
Honestly, IMO, the best solution right now is just to add our own If we want to properly support all of this, we'll have to go down the path of specializing the methods further, i.e. not do a For now, just make a How does that sound? |
Yeah, after thinking about it, I agree that this is probably the best way to go at this point. Working on it now! |
I have adapted the using CUDA
using LinearAlgebra
using Distributions, Random
using Bijectors
using Flux
import NormalizingFlows as NF
rng = CUDA.default_rng()
T = Float32
q0_g = MvNormal(CUDA.zeros(T, 2), I)
CUDA.functional()
ts = reduce(∘, [f32(Bijectors.PlanarLayer(2)) for _ in 1:2])
ts_g = gpu(ts)
flow_g = transformed(q0_g, ts_g) @torfjelde @sunxd3 Let me know if this attempt looks good to you. If so, I'll update the docs. |
It seems overloading an external package in an extension doesn't work (which is probably for the better), so atm the CUDA tests are failing.
But if we move the overloads into the main package, they run. So probably should do that from now on.