Allocations incurred from broadcast expression #1372

charleskawczynski · 2023-07-14T19:00:20Z

This line is breaking on the GPU, and is resulting in runtime allocations.

I'll try to make a MWE when I have a chance.

simonbyrne · 2023-07-14T20:38:48Z

The simplest example is when you broadcast a DataType over a scalar, which is then combined with the broadcast over another object, e.g.

FT = Float32
@. x + FT(1)

which is equivalent to

x .+ FT.(1)

Note that this is distinct from JuliaLang/julia#50554 (which is due to the use of flatten). You can hit this with plain CuArrays, see JuliaGPU/CUDA.jl#1761.

Unfortunately this is a common pattern we use to make functions type-generic. Some possible solutions:

The first is simply to avoid broadcasting types over scalars. We could either switch away from using @., so the above would be

x .+ FT(1)

which forces the the Float32(1) to be evaluated before broadcasting. You can use $() to do the same with @.:

@. x + $(FT(1))

but this is kind of ugly. Finally you can just move the conversion outside

i = FT(1)
@. x + i

If we want to keep using it, one option is via type-piracy on Base.Broadcast.broadcasted, a la #1365, but I would prefer not to do this. I think the suggestion in JuliaGPU/CUDA.jl#1761 (comment) of using Adapt is probably the best option, but it would have to be done in CUDA.jl or GPUArrays.jl.

Perhaps a more elegant solution is

struct SuffixConverter{FT}
end
Base.:*(x::Number, ::SuffixConverter{FT}) where {FT} = convert(FT, x)
Base.Broadcast.broadcasted(::typeof(*), x::Number, ::SuffixConverter{FT}) where {FT} = convert(FT, x)

Then we can write

_FT = SuffixConverter{FT}()
@. x + 1_FT

which has the nice advantage of avoiding more parentheses, and so could make the code cleaner.

simonbyrne · 2023-07-14T22:50:03Z

I've posted a patch to CUDA.jl which should address it: JuliaGPU/CUDA.jl#2000

charleskawczynski · 2023-07-14T23:15:57Z

I like the _FT suffix, it reminds me of Fortran syntax, but it can work with the locally generic float type. It saves one character (x_FT instead of FT(x)) and it's relatively simple. My only question is: Where does this definition live? Is this something that ClimaCore will define? It's kind of low level and doesn't really have anything to do with spatial operators.

simonbyrne · 2023-07-15T05:23:05Z

Okay, how about https://github.com/simonbyrne/SuffixConversion.jl

charleskawczynski · 2023-07-15T13:59:16Z

I think that’s a great solution, and probably better that it lives outside of clima since it’s very widely applicable.

milankl · 2023-07-17T17:37:18Z

I often end up writing things like

half = convert(NF,0.5)
dt_NF = convert(NF,dt)
Tmin = convert(NF,scheme.Tmin)
vj = convert(NF,v[j])

(SpeedyWeather.jl uses NF for number format) I like convert as it's more verbose that we're just converting a variable, but don't actually do any other computation. Someone not being used to FT(1) may think you're doing something much more sophisticated here? But honestly, I often just end up unpacking/converthing things in one go, like Tmin = convert(NF,scheme.Tmin) and I still very much prefer that over the conversion in the actual kernel. I'd rather be explicit than relying on the compiler to always treat NF(1) as a constant instead of redoing the conversion over and over again.

I'd even do g = convert(NF,gravity) so that can nicely use g*h or similar inside the kernel, but then it's only a few lines away that g is gravity!

Sbozzolo · 2023-11-30T19:41:27Z

JuliaGPU/CUDA.jl#2000 got merged, so maybe this is fixed?

charleskawczynski · 2024-03-24T18:23:19Z

This closed for 1.11, but is still an issue on 1.10.

charleskawczynski added the bug Something isn't working label Jul 14, 2023

charleskawczynski self-assigned this Jul 14, 2023

charleskawczynski mentioned this issue Sep 25, 2023

GPU compilation error in rayleigh_sponge_cache CliMA/ClimaAtmos.jl#2158

Closed

charleskawczynski added Inference Broadcasting labels Nov 30, 2023

charleskawczynski mentioned this issue Nov 30, 2023

Add support for online remapping CliMA/ClimaAtmos.jl#2179

Merged

simonbyrne closed this as completed Dec 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allocations incurred from broadcast expression #1372

Allocations incurred from broadcast expression #1372

charleskawczynski commented Jul 14, 2023

simonbyrne commented Jul 14, 2023

simonbyrne commented Jul 14, 2023

charleskawczynski commented Jul 14, 2023 •

edited

Loading

simonbyrne commented Jul 15, 2023

charleskawczynski commented Jul 15, 2023

milankl commented Jul 17, 2023 •

edited

Loading

Sbozzolo commented Nov 30, 2023

charleskawczynski commented Mar 24, 2024

Allocations incurred from broadcast expression #1372

Allocations incurred from broadcast expression #1372

Comments

charleskawczynski commented Jul 14, 2023

simonbyrne commented Jul 14, 2023

simonbyrne commented Jul 14, 2023

charleskawczynski commented Jul 14, 2023 • edited Loading

simonbyrne commented Jul 15, 2023

charleskawczynski commented Jul 15, 2023

milankl commented Jul 17, 2023 • edited Loading

Sbozzolo commented Nov 30, 2023

charleskawczynski commented Mar 24, 2024

charleskawczynski commented Jul 14, 2023 •

edited

Loading

milankl commented Jul 17, 2023 •

edited

Loading