Initial work on CUDA-compat #25

torfjelde · 2023-08-10T15:25:49Z

It seems overloading an external package in an extension doesn't work (which is probably for the better), so atm the CUDA tests are failing.

But if we move the overloads into the main package, they run. So probably should do that from now on.

zuhengxu · 2023-08-15T23:16:07Z

I think now the CUDAext works properly, current tests about cuda all passes. The following code runs properly:

using CUDA
using LinearAlgebra
using Distributions, Random
using Bijectors
using NormalizingFlows

rng = CUDA.default_rng()
T = Float32
q0_g = MvNormal(CUDA.zeros(T, 2), I)

CUDA.functional()
ts_g = gpu(ts)
flow_g = transformed(q0_g, ts_g)

x = rand(rng, q0_g) # good

However, there is still issue to fix---sample multiple samples at once, and sample from Bijectors.TransformedDistribuition . Minimal examples are as follows:

sample multiple samples in one batch

xs = rand(rng, q0_g, 10) # ambiguous

error message:

ERROR: MethodError: rand(::CUDA.RNG, ::MvNormal{Float32, PDMats.ScalMat{Float32}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, ::Int64) is ambiguous.

Candidates:
  rand(rng::Random.AbstractRNG, s::Sampleable{Multivariate, Continuous}, n::Int64)
    @ Distributions ~/.julia/packages/Distributions/Ufrz2/src/multivariates.jl:23
  rand(rng::Random.AbstractRNG, s::Sampleable{Multivariate}, n::Int64)
    @ Distributions ~/.julia/packages/Distributions/Ufrz2/src/multivariates.jl:21
  rand(rng::CUDA.RNG, s::Sampleable{<:ArrayLikeVariate, Continuous}, n::Int64)
    @ NormalizingFlowsCUDAExt ~/Research/Turing/NormalizingFlows.jl/ext/NormalizingFlowsCUDAExt.jl:16

Possible fix, define
  rand(::CUDA.RNG, ::Sampleable{Multivariate, Continuous}, ::Int64)

Stacktrace:
 [1] top-level scope
   @ ~/Research/Turing/NormalizingFlows.jl/example/test.jl:42

sample from Bijectors.TransformedDistribution:

y = rand(rng, flow_g) # ambiguous

err meesage:

ERROR: MethodError: rand(::CUDA.RNG, ::MultivariateTransformed{MvNormal{Float32, PDMats.ScalMat{Float32}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, ComposedFunction{PlanarLayer{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, PlanarLayer{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}}) is ambiguous.

Candidates:
  rand(rng::Random.AbstractRNG, td::MultivariateTransformed)
    @ Bijectors ~/.julia/packages/Bijectors/cvMxj/src/transformed_distribution.jl:160
  rand(rng::CUDA.RNG, s::Sampleable{<:ArrayLikeVariate, Continuous})
    @ NormalizingFlowsCUDAExt ~/Research/Turing/NormalizingFlows.jl/ext/NormalizingFlowsCUDAExt.jl:7

Possible fix, define
  rand(::CUDA.RNG, ::MultivariateTransformed)

Stacktrace:
 [1] top-level scope
   @ ~/Research/Turing/NormalizingFlows.jl/example/test.jl:40

This is partially because we are overloading methods and types that do not own by this pkg.
Any thoughts about how to address this @torfjelde @sunxd3?

sunxd3 · 2023-08-16T17:32:48Z

I don't have a immediate solution other than the suggested fixes.
It is indeed a bit annoying, maybe we don't dispatch on rng?

zuhengxu · 2023-08-16T21:57:18Z

It is indeed a bit annoying, maybe we don't dispatch on rng?

Yeah, I agree. For temporary solution, I'm thinking adding an additional argument for Distribution.rand, something like device to indicate on cpu or on gpu. But for long term fix, Im now leaning towards your previous attempts. Although this will require resolving some compatibility issue with Bijectors.

torfjelde · 2023-08-17T12:10:06Z

ext/NormalizingFlowsCUDAExt.jl

+
+function Distributions._rand!(rng::CUDA.RNG, d::Distributions.MvNormal, x::CuVecOrMat)
+    # Replaced usage of scalar indexing.
+    CUDA.randn!(rng, x)


@zuhengxu do you know why this change of yours was necessary? I thought Random.randn!(rng, x) should just dispatch to CUDA.randn!(rng, x)?

ahh, you are right---this is not necessary. I think I just made the change to ensure it's actually calling the cuda sampling. I can change it back.

torfjelde · 2023-08-17T12:13:16Z

ext/NormalizingFlowsCUDAExt.jl

+function Distributions.rand(
+    rng::CUDA.RNG,
+    s::Distributions.Sampleable{<:Distributions.ArrayLikeVariate,Distributions.Continuous},
+    n::Int,
+)
+    return @inbounds Distributions.rand!(
+        rng, Distributions.sampler(s), CuArray{float(eltype(s))}(undef, length(s), n)
+    )
+end


Usage of length here will cause some issues, e.g. what if s is wrapping a matrix distribution?

Maybe (undef, size(s)..., n) will do? But I don't quite recall what is the correct size here; should be somewhere in the Distributions.jl docs.

torfjelde · 2023-08-17T12:14:33Z

example/Project.toml

@@ -15,3 +15,4 @@ Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
 Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
 Revise = "295af30f-e4ad-537b-8983-00126c2a3abe"
 Zygote = "e88e6eb3-aa80-5325-afca-941959d7151f"
+cuDNN = "02a925ec-e4fe-4b08-9a7e-0d78e3d38ccd"


Is this needed now (after you removed the test-file you were using)?

Suggested change

cuDNN = "02a925ec-e4fe-4b08-9a7e-0d78e3d38ccd"

This file is needed if we want some of the Flux.jl chain to run properly on GPU. But you are right, it's not used for the current examples---they are all runing on CPUs. I'll remove it later

test/interface.jl

torfjelde · 2023-08-17T12:20:37Z

Honestly, IMO, the best solution right now is just to add our own rand for now to avoid ambiguity errors.

If we want to properly support all of this, we'll have to go down the path of specializing the methods further, i.e. not do a Union as we've done now, which will take time and effort.

For now, just make a NormalizingFlows.rand_device or something, that just calls rand by default, but which we can then overload to our liking without running into ambiguity-errors.

How does that sound?

zuhengxu · 2023-08-20T00:11:39Z

For now, just make a NormalizingFlows.rand_device or something, that just calls rand by default, but which we can then overload to our liking without running into ambiguity-errors.

Yeah, after thinking about it, I agree that this is probably the best way to go at this point. Working on it now!

zuhengxu · 2023-08-22T05:34:52Z

I have adapted the NF.rand_device() approach. I think now we have a work around. The following code runs properly:

using CUDA
using LinearAlgebra
using Distributions, Random
using Bijectors
using Flux
import NormalizingFlows as NF

rng = CUDA.default_rng()
T = Float32
q0_g = MvNormal(CUDA.zeros(T, 2), I)

CUDA.functional()
ts = reduce(∘, [f32(Bijectors.PlanarLayer(2)) for _ in 1:2])
ts_g = gpu(ts)
flow_g = transformed(q0_g, ts_g)

@torfjelde @sunxd3 Let me know if this attempt looks good to you. If so, I'll update the docs.

torfjelde and others added 9 commits August 10, 2023 16:16

added CUDA extension

4d54b64

Merge branch 'main' into torfjelde/cuda

6bbe297

fixed merge issue

d9f3f1a

rm extra weakdeps

a3619ca

edit toml

0dde183

@require cuda ext

1f795a1

minor update tests

847a520

try to fix ambiguity

51dc577

rm tmp test file

7a56c05

torfjelde commented Aug 17, 2023

View reviewed changes

zuhengxu added 6 commits August 21, 2023 12:56

wip on randdevice interface

5ec4317

move cuda part in ext

e1fa4b9

rename sampler.jl to sample.jl

fb07372

update objectives/elbo.jl to use rand_device

84fce4c

update test/cuda.jl

b573508

rm exmaple/test.jl

e8457d4

zuhengxu added 5 commits August 21, 2023 22:39

fix CI test error

c21400a

fix CI error

0ced915

update flux compat for test/

45c8e0a

rm tmp test file

4be9588

fix cuda err diffresults.gradient_result

1e72b9d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial work on CUDA-compat #25

Initial work on CUDA-compat #25

torfjelde commented Aug 10, 2023

zuhengxu commented Aug 15, 2023 •

edited

Loading

sunxd3 commented Aug 16, 2023

zuhengxu commented Aug 16, 2023 •

edited

Loading

torfjelde Aug 17, 2023

zuhengxu Aug 19, 2023

torfjelde Aug 17, 2023

torfjelde Aug 17, 2023

zuhengxu Aug 19, 2023

torfjelde commented Aug 17, 2023

zuhengxu commented Aug 20, 2023

zuhengxu commented Aug 22, 2023 •

edited

Loading

Initial work on CUDA-compat #25

Are you sure you want to change the base?

Initial work on CUDA-compat #25

Conversation

torfjelde commented Aug 10, 2023

zuhengxu commented Aug 15, 2023 • edited Loading

sunxd3 commented Aug 16, 2023

zuhengxu commented Aug 16, 2023 • edited Loading

torfjelde Aug 17, 2023

Choose a reason for hiding this comment

zuhengxu Aug 19, 2023

Choose a reason for hiding this comment

torfjelde Aug 17, 2023

Choose a reason for hiding this comment

torfjelde Aug 17, 2023

Choose a reason for hiding this comment

zuhengxu Aug 19, 2023

Choose a reason for hiding this comment

torfjelde commented Aug 17, 2023

zuhengxu commented Aug 20, 2023

zuhengxu commented Aug 22, 2023 • edited Loading

zuhengxu commented Aug 15, 2023 •

edited

Loading

zuhengxu commented Aug 16, 2023 •

edited

Loading

zuhengxu commented Aug 22, 2023 •

edited

Loading