Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allocations incurred from broadcast expression #1372

Closed
charleskawczynski opened this issue Jul 14, 2023 · 8 comments
Closed

Allocations incurred from broadcast expression #1372

charleskawczynski opened this issue Jul 14, 2023 · 8 comments
Assignees
Labels
Broadcasting bug Something isn't working Inference

Comments

@charleskawczynski
Copy link
Member

This line is breaking on the GPU, and is resulting in runtime allocations.

I'll try to make a MWE when I have a chance.

@charleskawczynski charleskawczynski added the bug Something isn't working label Jul 14, 2023
@charleskawczynski charleskawczynski self-assigned this Jul 14, 2023
@simonbyrne
Copy link
Member

The simplest example is when you broadcast a DataType over a scalar, which is then combined with the broadcast over another object, e.g.

FT = Float32
@. x + FT(1)

which is equivalent to

x .+ FT.(1)

Note that this is distinct from JuliaLang/julia#50554 (which is due to the use of flatten). You can hit this with plain CuArrays, see JuliaGPU/CUDA.jl#1761.

Unfortunately this is a common pattern we use to make functions type-generic. Some possible solutions:

The first is simply to avoid broadcasting types over scalars. We could either switch away from using @., so the above would be

x .+ FT(1)

which forces the the Float32(1) to be evaluated before broadcasting. You can use $() to do the same with @.:

@. x + $(FT(1))

but this is kind of ugly. Finally you can just move the conversion outside

i = FT(1)
@. x + i

If we want to keep using it, one option is via type-piracy on Base.Broadcast.broadcasted, a la #1365, but I would prefer not to do this. I think the suggestion in JuliaGPU/CUDA.jl#1761 (comment) of using Adapt is probably the best option, but it would have to be done in CUDA.jl or GPUArrays.jl.

Perhaps a more elegant solution is

struct SuffixConverter{FT}
end
Base.:*(x::Number, ::SuffixConverter{FT}) where {FT} = convert(FT, x)
Base.Broadcast.broadcasted(::typeof(*), x::Number, ::SuffixConverter{FT}) where {FT} = convert(FT, x)

Then we can write

_FT = SuffixConverter{FT}()
@. x + 1_FT

which has the nice advantage of avoiding more parentheses, and so could make the code cleaner.

@simonbyrne
Copy link
Member

I've posted a patch to CUDA.jl which should address it: JuliaGPU/CUDA.jl#2000

@charleskawczynski
Copy link
Member Author

charleskawczynski commented Jul 14, 2023

I like the _FT suffix, it reminds me of Fortran syntax, but it can work with the locally generic float type. It saves one character (x_FT instead of FT(x)) and it's relatively simple. My only question is: Where does this definition live? Is this something that ClimaCore will define? It's kind of low level and doesn't really have anything to do with spatial operators.

@simonbyrne
Copy link
Member

Okay, how about https://github.com/simonbyrne/SuffixConversion.jl

@charleskawczynski
Copy link
Member Author

I think that’s a great solution, and probably better that it lives outside of clima since it’s very widely applicable.

@milankl
Copy link

milankl commented Jul 17, 2023

I often end up writing things like

half = convert(NF,0.5)
dt_NF = convert(NF,dt)
Tmin = convert(NF,scheme.Tmin)
vj = convert(NF,v[j])

(SpeedyWeather.jl uses NF for number format) I like convert as it's more verbose that we're just converting a variable, but don't actually do any other computation. Someone not being used to FT(1) may think you're doing something much more sophisticated here? But honestly, I often just end up unpacking/converthing things in one go, like Tmin = convert(NF,scheme.Tmin) and I still very much prefer that over the conversion in the actual kernel. I'd rather be explicit than relying on the compiler to always treat NF(1) as a constant instead of redoing the conversion over and over again.

I'd even do g = convert(NF,gravity) so that can nicely use g*h or similar inside the kernel, but then it's only a few lines away that g is gravity!

@Sbozzolo
Copy link
Member

JuliaGPU/CUDA.jl#2000 got merged, so maybe this is fixed?

@charleskawczynski
Copy link
Member Author

This closed for 1.11, but is still an issue on 1.10.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Broadcasting bug Something isn't working Inference
Projects
None yet
Development

No branches or pull requests

4 participants