(0.93.2) Update Adapt.jl compat and fix `Float32` CATKE on GPU #3876

ali-ramadhan · 2024-10-28T17:00:49Z

Just opening this PR with what is currently a band-aid fix for Float32 CATKE on GPUs. Hoping I can figure out what's actually wrong. But for now it'll be useful to have a working branch.

Resolves #3870 (eventually, hopefully)

...rbulence_closure_implementations/TKEBasedVerticalDiffusivities/catke_vertical_diffusivity.jl

glwagner · 2024-10-28T19:34:25Z

Just some minor comments:

Use eltype(grid) instead of a type parameter, following YASGuide
Does the annotation ::FT work?

…asedVerticalDiffusivities/catke_vertical_diffusivity.jl Co-authored-by: Gregory L. Wagner <[email protected]>

ali-ramadhan · 2024-10-29T18:07:47Z

Thanks for the review @glwagner. I made the changes and also added a test that fails (well CUDA crashes) without this PR, and passes with this PR.

Does the annotation ::FT work?

Unfortunately not. I ended up getting GPU exceptions instead: #3870 (comment)

glwagner · 2024-10-29T20:00:17Z

...rbulence_closure_implementations/TKEBasedVerticalDiffusivities/catke_vertical_diffusivity.jl

@@ -228,7 +227,7 @@ end
    Jᵇᵋ = closure.minimum_convective_buoyancy_flux
    Jᵇᵢⱼ = @inbounds Jᵇ[i, j, 1]
    Jᵇ⁺ = max(Jᵇᵋ, Jᵇᵢⱼ, Jᵇ★) # selects fastest (dominant) time-scale
-    t★ = (ℓᴰ^2 / Jᵇ⁺)^(1/3)
+    t★ = cbrt(ℓᴰ^2 / Jᵇ⁺)


Can you check if this fixes the issue without the need for extra convert?

It will be good to avoid "over converting", because this could cause us to fail to catch spurious promotion which will hurt performance (eg removing some of the benefit of using Float32)

I checked but it was not enough unfortunately.

Agree that it would be good to not over convert.

The point is just that you may lose much of the advantage of using Float32 in the first place if you throw convert around

glwagner · 2024-10-29T20:00:46Z

...rbulence_closure_implementations/TKEBasedVerticalDiffusivities/catke_vertical_diffusivity.jl

+    κu★ = min(κu, κu_max)
+    FT = eltype(grid)
+    return convert(FT, κu★)
 end


For this PR I suggest changing this to a a type annotation (κu★::FT) since this is what we will want in the long run

Unfortunately that does not work, but maybe I can pinpoint where the conversion is happening then we won't need any conversion here.

Sorry I am not suggesting this as a solution, but rather as a way to catch a bug in the future.

Ah I don't think I actually know how type annotations work. So if we say

return κu★::FT

and κu★ is not of type FT then an error/exception will be thrown?

I think so, isn't that what you found?

It's a bug checking mechanism which may be more broadly useful as we try to prevent spurious promotion

No idea why I thought it was a different mechanism to convert types lol. But yes that is what I found.

glwagner · 2024-10-29T20:01:51Z

I think to preserve the work in this PR, we should add a Float32 test which will fail if a spurious promotion undermines performance

ali-ramadhan · 2024-10-29T20:19:41Z

I think to preserve the work in this PR, we should add a Float32 test which will fail if a spurious promotion undermines performance

Agreed. I'll revisit this PR later to see if I can find where the conversion happens. The test I added only checks to see if we can take a time step. But I should be able to also add a test to ensure no spurious promotion occurred.

glwagner · 2024-10-29T21:06:14Z

I think to preserve the work in this PR, we should add a Float32 test which will fail if a spurious promotion undermines performance

Agreed. I'll revisit this PR later to see if I can find where the conversion happens. The test I added only checks to see if we can take a time step. But I should be able to also add a test to ensure no spurious promotion occurred.

Ah, that will work as a test if we remove the convert.

The convert is a good sanity check to find where the problem is, but its not a solution since it merely allows the code to run without error --- it doesn't actually allow us to realize the benefits of using Float32. Arguably with this it is actually worse to use Float32, since the numerics are degraded bbut the perfrmance benefit is not fully realized

ali-ramadhan · 2024-10-30T00:06:43Z

Following #3870 (comment) this PR now just changes how grid coordinate ranges are constructed. Curious to see if any tests fail. But locally it fixed CATKE + Float32.

@glwagner I ended up doing this if it looks okay:

    κu★ = min(κu, κu_max)
    FT = eltype(grid)
    return κu★::FT

I assume there's a small cost associated with the type annotation ::FT?

glwagner · 2024-10-30T01:43:18Z

src/Grids/grid_generation.jl

+    F = StepRangeLen{FT, FT, FT, Int}(F)
+    C = StepRangeLen{FT, FT, FT, Int}(C)


I don't think it is correct to describe this change as "making the grid coordinate ranges type-safe".

Perhaps more conservatively, we only require the first type parameter of StepRangeLen to be FT. The second two should be twice precision; eg Float64 if the first is Float32, or TwicePrecision{Float64} if FT=Float64.

Now, if we want to support non-standard ranges, we can perhaps consider that. But I suspect there is something going on that isn't completely understood here.

This is discussed on #3870, but just to express the problem here: we expect the ranges to have the type StepRangeLen{Float32, Float64, Float64, Int} for a Float32 grid. Therefore, this PR does some damage by producing an unexpected range type.

It is unexpected or a bug that a range of type StepRangeLen{Float32, Float64, Float64, Int} would produce Float64. Therefore the first course of action is to verify that StepRangeLen{Float32, Float64, Float64, Int} is producing Float64 --- on either CPU or GPU.

I have a hunch that StepRangeLen is somehow adapted for GPU incorrectly. Or, if it is intentional, then I think we should fix the output of xnode, ynode, znode...

If there's good motivation for reducing the precision of ranges (either in addition or alternatively to the above suggestion) then I think we can entertain it. I'm not sure we want to hardcode this change though, it might be better to provide it as an option. The imprecision of ranges is already a bit annoying.

Since this makes the test pass and the MWE in #3870 not error, maybe the hint is that StepRangeLen just needs to be adapted for the GPU?

But yeah I'm not sure why StepRangeLen{Float32, Float64, Float64, Int} seems to produce Float64 numbers in GPU kernels. Perhaps this is worth opening an issue on CUDA.jl with a MWE?

Yes, though it is not for us to adapt (that would be type piracy) because we own neither Adapt nor StepRangeLen. Plus it seems to be adapted, so what is happening...

https://github.com/JuliaGPU/Adapt.jl/blob/5ef7c5329609df7ffb5b19942d6747b3dcc162c2/src/base.jl#L79-L80

glwagner · 2024-10-30T03:20:56Z

This may help JuliaGPU/Adapt.jl#88

ali-ramadhan · 2024-10-30T14:17:44Z

I just changed the Adapt.jl compat entry to make use of the new version with the StepRangeLen fix. The MWE from #3870 does not error with Adapt.jl v4.1.1 locally.

navidcy · 2024-10-30T19:56:27Z

Project.toml

@@ -43,7 +43,7 @@ OceananigansEnzymeExt = "Enzyme"
 OceananigansMakieExt = ["MakieCore", "Makie"]

 [compat]
-Adapt = "3, 4"
+Adapt = "^4.1.1"


Suggested change

Adapt = "^4.1.1"

Adapt = "4.1.1"

(same but cleaner)

navidcy

this is heroic debugging!!

navidcy · 2024-10-30T19:59:37Z

you can forget my minor suggestion to remove ^! let's merge this!

ali-ramadhan · 2024-10-30T20:27:59Z

Haha it did take a while but with a satisfying ending!

And thanks for the suggestion! I didn't realize that 4.1.1 and ^4.1.1 would be the same here. But since it's okay with you, I'll merge to avoid waiting on another round of tests to pass 🙃

navidcy · 2024-10-30T21:55:33Z

to avoid waiting on another round of tests to pass 🙃

Exactly! Takes for ever...!

Band aid for Float32 CATKE

1ad0d92

ali-ramadhan changed the title ~~Band aid for Float32 CATKE~~ Fixing Float32 CATKE on GPU Oct 28, 2024

glwagner reviewed Oct 28, 2024

View reviewed changes

...rbulence_closure_implementations/TKEBasedVerticalDiffusivities/catke_vertical_diffusivity.jl Outdated Show resolved Hide resolved

glwagner reviewed Oct 28, 2024

View reviewed changes

...rbulence_closure_implementations/TKEBasedVerticalDiffusivities/catke_vertical_diffusivity.jl Outdated Show resolved Hide resolved

navidcy added bug 🐞 Even a perfect program still has bugs turbulence closures 🎐 GPU 👾 Where Oceananigans gets its powers from labels Oct 28, 2024

ali-ramadhan and others added 6 commits October 29, 2024 11:47

Update src/TurbulenceClosures/turbulence_closure_implementations/TKEB…

9c35096

…asedVerticalDiffusivities/catke_vertical_diffusivity.jl Co-authored-by: Gregory L. Wagner <[email protected]>

Update src/TurbulenceClosures/turbulence_closure_implementations/TKEB…

9aba47a

…asedVerticalDiffusivities/catke_vertical_diffusivity.jl Co-authored-by: Gregory L. Wagner <[email protected]>

Clean up and return on separate lines

9defbf0

Add test to time step a hydrostatic model with CATKE + Float32

88ad7ed

Use type-safe cbrt instead of ^(1/3)

bcb979d

Merge branch 'main' into ali/fix-catke-f32

e3d83f8

ali-ramadhan marked this pull request as ready for review October 29, 2024 18:07

ali-ramadhan requested a review from glwagner October 29, 2024 18:07

ali-ramadhan added 3 commits October 29, 2024 12:08

cleanup

150d502

More cleanup

0806342

Typos!

6ce177d

glwagner reviewed Oct 29, 2024

View reviewed changes

ali-ramadhan added 2 commits October 29, 2024 18:02

Use type annotations when returning CATKE diffusivities

9238484

Just force StepRangeLen to use same type for reference and step size

9d9d573

ali-ramadhan changed the title ~~Fixing Float32 CATKE on GPU~~ Make grid coordinate ranges type-safe and fix Float32 CATKE on GPU Oct 30, 2024

glwagner reviewed Oct 30, 2024

View reviewed changes

ali-ramadhan added 3 commits October 30, 2024 08:11

Merge branch 'main' into ali/fix-catke-f32

6c5b653

Set Adapt.jl compat to ^4.1.1

bdd9c38

Don't force StepRangeLen types

5573112

ali-ramadhan changed the title ~~Make grid coordinate ranges type-safe and fix Float32 CATKE on GPU~~ Update Adapt.jl compat and fix Float32 CATKE on GPU Oct 30, 2024

ali-ramadhan requested a review from glwagner October 30, 2024 14:17

Bump v0.93.2

939737e

ali-ramadhan changed the title ~~Update Adapt.jl compat and fix Float32 CATKE on GPU~~ (0.93.2) Update Adapt.jl compat and fix Float32 CATKE on GPU Oct 30, 2024

navidcy reviewed Oct 30, 2024

View reviewed changes

navidcy approved these changes Oct 30, 2024

View reviewed changes

ali-ramadhan merged commit f2a8fb3 into main Oct 30, 2024
46 checks passed

ali-ramadhan deleted the ali/fix-catke-f32 branch October 30, 2024 20:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(0.93.2) Update Adapt.jl compat and fix `Float32` CATKE on GPU #3876

(0.93.2) Update Adapt.jl compat and fix `Float32` CATKE on GPU #3876

ali-ramadhan commented Oct 28, 2024 •

edited

Loading

glwagner commented Oct 28, 2024

ali-ramadhan commented Oct 29, 2024

glwagner Oct 29, 2024

ali-ramadhan Oct 29, 2024

glwagner Oct 29, 2024

glwagner Oct 29, 2024

ali-ramadhan Oct 29, 2024

glwagner Oct 29, 2024

ali-ramadhan Oct 29, 2024

glwagner Oct 29, 2024

glwagner Oct 29, 2024

ali-ramadhan Oct 29, 2024

glwagner commented Oct 29, 2024

ali-ramadhan commented Oct 29, 2024

glwagner commented Oct 29, 2024 •

edited

Loading

ali-ramadhan commented Oct 30, 2024

glwagner Oct 30, 2024 •

edited

Loading

glwagner Oct 30, 2024

ali-ramadhan Oct 30, 2024

glwagner Oct 30, 2024

glwagner commented Oct 30, 2024

ali-ramadhan commented Oct 30, 2024

navidcy Oct 30, 2024 •

edited

Loading

navidcy left a comment

navidcy commented Oct 30, 2024

ali-ramadhan commented Oct 30, 2024

navidcy commented Oct 30, 2024

		F = StepRangeLen{FT, FT, FT, Int}(F)
		C = StepRangeLen{FT, FT, FT, Int}(C)

(0.93.2) Update Adapt.jl compat and fix Float32 CATKE on GPU #3876

(0.93.2) Update Adapt.jl compat and fix Float32 CATKE on GPU #3876

Conversation

ali-ramadhan commented Oct 28, 2024 • edited Loading

glwagner commented Oct 28, 2024

ali-ramadhan commented Oct 29, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

glwagner commented Oct 29, 2024

ali-ramadhan commented Oct 29, 2024

glwagner commented Oct 29, 2024 • edited Loading

ali-ramadhan commented Oct 30, 2024

glwagner Oct 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

glwagner commented Oct 30, 2024

ali-ramadhan commented Oct 30, 2024

navidcy Oct 30, 2024 • edited Loading

Choose a reason for hiding this comment

navidcy left a comment

Choose a reason for hiding this comment

navidcy commented Oct 30, 2024

ali-ramadhan commented Oct 30, 2024

navidcy commented Oct 30, 2024

(0.93.2) Update Adapt.jl compat and fix `Float32` CATKE on GPU #3876

(0.93.2) Update Adapt.jl compat and fix `Float32` CATKE on GPU #3876

ali-ramadhan commented Oct 28, 2024 •

edited

Loading

glwagner commented Oct 29, 2024 •

edited

Loading

glwagner Oct 30, 2024 •

edited

Loading

navidcy Oct 30, 2024 •

edited

Loading