Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial work on CUDA-compat #25

Open
wants to merge 20 commits into
base: main
Choose a base branch
from
Open

Initial work on CUDA-compat #25

wants to merge 20 commits into from

Conversation

torfjelde
Copy link
Member

It seems overloading an external package in an extension doesn't work (which is probably for the better), so atm the CUDA tests are failing.

But if we move the overloads into the main package, they run. So probably should do that from now on.

@zuhengxu
Copy link
Member

zuhengxu commented Aug 15, 2023

I think now the CUDAext works properly, current tests about cuda all passes. The following code runs properly:

using CUDA
using LinearAlgebra
using Distributions, Random
using Bijectors
using NormalizingFlows

rng = CUDA.default_rng()
T = Float32
q0_g = MvNormal(CUDA.zeros(T, 2), I)

CUDA.functional()
ts_g = gpu(ts)
flow_g = transformed(q0_g, ts_g)

x = rand(rng, q0_g) # good 

However, there is still issue to fix---sample multiple samples at once, and sample from Bijectors.TransformedDistribuition . Minimal examples are as follows:

  • sample multiple samples in one batch
xs = rand(rng, q0_g, 10) # ambiguous 

error message:

ERROR: MethodError: rand(::CUDA.RNG, ::MvNormal{Float32, PDMats.ScalMat{Float32}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, ::Int64) is ambiguous.

Candidates:
  rand(rng::Random.AbstractRNG, s::Sampleable{Multivariate, Continuous}, n::Int64)
    @ Distributions ~/.julia/packages/Distributions/Ufrz2/src/multivariates.jl:23
  rand(rng::Random.AbstractRNG, s::Sampleable{Multivariate}, n::Int64)
    @ Distributions ~/.julia/packages/Distributions/Ufrz2/src/multivariates.jl:21
  rand(rng::CUDA.RNG, s::Sampleable{<:ArrayLikeVariate, Continuous}, n::Int64)
    @ NormalizingFlowsCUDAExt ~/Research/Turing/NormalizingFlows.jl/ext/NormalizingFlowsCUDAExt.jl:16

Possible fix, define
  rand(::CUDA.RNG, ::Sampleable{Multivariate, Continuous}, ::Int64)

Stacktrace:
 [1] top-level scope
   @ ~/Research/Turing/NormalizingFlows.jl/example/test.jl:42
  • sample from Bijectors.TransformedDistribution:
y = rand(rng, flow_g) # ambiguous

err meesage:

ERROR: MethodError: rand(::CUDA.RNG, ::MultivariateTransformed{MvNormal{Float32, PDMats.ScalMat{Float32}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, ComposedFunction{PlanarLayer{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, PlanarLayer{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}}) is ambiguous.

Candidates:
  rand(rng::Random.AbstractRNG, td::MultivariateTransformed)
    @ Bijectors ~/.julia/packages/Bijectors/cvMxj/src/transformed_distribution.jl:160
  rand(rng::CUDA.RNG, s::Sampleable{<:ArrayLikeVariate, Continuous})
    @ NormalizingFlowsCUDAExt ~/Research/Turing/NormalizingFlows.jl/ext/NormalizingFlowsCUDAExt.jl:7

Possible fix, define
  rand(::CUDA.RNG, ::MultivariateTransformed)

Stacktrace:
 [1] top-level scope
   @ ~/Research/Turing/NormalizingFlows.jl/example/test.jl:40

This is partially because we are overloading methods and types that do not own by this pkg.
Any thoughts about how to address this @torfjelde @sunxd3?

@sunxd3
Copy link
Member

sunxd3 commented Aug 16, 2023

I don't have a immediate solution other than the suggested fixes.
It is indeed a bit annoying, maybe we don't dispatch on rng?

@zuhengxu
Copy link
Member

zuhengxu commented Aug 16, 2023

It is indeed a bit annoying, maybe we don't dispatch on rng?

Yeah, I agree. For temporary solution, I'm thinking adding an additional argument for Distribution.rand, something like device to indicate on cpu or on gpu. But for long term fix, Im now leaning towards your previous attempts. Although this will require resolving some compatibility issue with Bijectors.


function Distributions._rand!(rng::CUDA.RNG, d::Distributions.MvNormal, x::CuVecOrMat)
# Replaced usage of scalar indexing.
CUDA.randn!(rng, x)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zuhengxu do you know why this change of yours was necessary? I thought Random.randn!(rng, x) should just dispatch to CUDA.randn!(rng, x)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh, you are right---this is not necessary. I think I just made the change to ensure it's actually calling the cuda sampling. I can change it back.

Comment on lines 16 to 24
function Distributions.rand(
rng::CUDA.RNG,
s::Distributions.Sampleable{<:Distributions.ArrayLikeVariate,Distributions.Continuous},
n::Int,
)
return @inbounds Distributions.rand!(
rng, Distributions.sampler(s), CuArray{float(eltype(s))}(undef, length(s), n)
)
end
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usage of length here will cause some issues, e.g. what if s is wrapping a matrix distribution?

Maybe (undef, size(s)..., n) will do? But I don't quite recall what is the correct size here; should be somewhere in the Distributions.jl docs.

@@ -15,3 +15,4 @@ Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
Revise = "295af30f-e4ad-537b-8983-00126c2a3abe"
Zygote = "e88e6eb3-aa80-5325-afca-941959d7151f"
cuDNN = "02a925ec-e4fe-4b08-9a7e-0d78e3d38ccd"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed now (after you removed the test-file you were using)?

Suggested change
cuDNN = "02a925ec-e4fe-4b08-9a7e-0d78e3d38ccd"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is needed if we want some of the Flux.jl chain to run properly on GPU. But you are right, it's not used for the current examples---they are all runing on CPUs. I'll remove it later

test/interface.jl Show resolved Hide resolved
@torfjelde
Copy link
Member Author

Honestly, IMO, the best solution right now is just to add our own rand for now to avoid ambiguity errors.

If we want to properly support all of this, we'll have to go down the path of specializing the methods further, i.e. not do a Union as we've done now, which will take time and effort.

For now, just make a NormalizingFlows.rand_device or something, that just calls rand by default, but which we can then overload to our liking without running into ambiguity-errors.

How does that sound?

@zuhengxu
Copy link
Member

For now, just make a NormalizingFlows.rand_device or something, that just calls rand by default, but which we can then overload to our liking without running into ambiguity-errors.

Yeah, after thinking about it, I agree that this is probably the best way to go at this point. Working on it now!

@zuhengxu
Copy link
Member

zuhengxu commented Aug 22, 2023

I have adapted the NF.rand_device() approach. I think now we have a work around. The following code runs properly:

using CUDA
using LinearAlgebra
using Distributions, Random
using Bijectors
using Flux
import NormalizingFlows as NF

rng = CUDA.default_rng()
T = Float32
q0_g = MvNormal(CUDA.zeros(T, 2), I)

CUDA.functional()
ts = reduce(, [f32(Bijectors.PlanarLayer(2)) for _ in 1:2])
ts_g = gpu(ts)
flow_g = transformed(q0_g, ts_g)

@torfjelde @sunxd3 Let me know if this attempt looks good to you. If so, I'll update the docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants