Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gradient on combination of horizontal and 3d fields does not work #1989

Open
haakon-e opened this issue Sep 14, 2024 · 9 comments · May be fixed by #2172
Open

gradient on combination of horizontal and 3d fields does not work #1989

haakon-e opened this issue Sep 14, 2024 · 9 comments · May be fixed by #2172
Labels
bug Something isn't working Design question feature New feature

Comments

@haakon-e
Copy link
Member

haakon-e commented Sep 14, 2024

Describe the bug

The following code produces an error:

ᶜgradᵥ = Operators.GradientF2C()

level_field = Fields.level(Fields.Field(Float64, center_space), 1)
ᶠscalar_field = Fields.Field(Float64, face_space)
# Does not work:
@. ᶜgradᵥ(level_field + ᶠscalar_field)
produces the following error
ERROR: InvalidIRError: compiling MethodInstance for ClimaCoreCUDAExt.copyto_stencil_kernel!(::ClimaCore.Fields.Field{…}, ::ClimaCore.Operators.StencilBroadcasted{…}, ::ClimaCore.Spaces.ExtrudedFiniteDifferenceSpace{…}, ::NTuple{…}, ::ClimaCore.DataLayouts.UniversalSize{…}) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to level)
Stacktrace:
 [1] copyto_stencil_kernel!
   @ ~/.julia/packages/ClimaCore/wHC4M/ext/cuda/operators_finite_difference.jl:0
Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erronous code with Cthulhu.jl
Stacktrace:
  [1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, args::LLVM.Module)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/Y4hSX/src/validation.jl:147
  [2] macro expansion
    @ ~/.julia/packages/GPUCompiler/Y4hSX/src/driver.jl:458 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/TimerOutputs/Lw5SP/src/TimerOutput.jl:253 [inlined]
  [4] macro expansion
    @ ~/.julia/packages/GPUCompiler/Y4hSX/src/driver.jl:457 [inlined]
  [5] emit_llvm(job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, only_entry::Bool, validate::Bool)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/Y4hSX/src/utils.jl:103
  [6] emit_llvm
    @ ~/.julia/packages/GPUCompiler/Y4hSX/src/utils.jl:97 [inlined]
  [7] codegen(output::Symbol, job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, strip::Bool, validate::Bool, only_entry::Bool, parent_job::Nothing)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/Y4hSX/src/driver.jl:136
  [8] codegen
    @ ~/.julia/packages/GPUCompiler/Y4hSX/src/driver.jl:115 [inlined]
  [9] compile(target::Symbol, job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, strip::Bool, validate::Bool, only_entry::Bool)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/Y4hSX/src/driver.jl:111
 [10] compile
    @ ~/.julia/packages/GPUCompiler/Y4hSX/src/driver.jl:103 [inlined]
 [11] #1145
    @ ~/.julia/packages/CUDA/Tl08O/src/compiler/compilation.jl:254 [inlined]
 [12] JuliaContext(f::CUDA.var"#1145#1148"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}}; kwargs::@Kwargs{})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/Y4hSX/src/driver.jl:52
 [13] JuliaContext(f::Function)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/Y4hSX/src/driver.jl:42
 [14] compile(job::GPUCompiler.CompilerJob)
    @ CUDA ~/.julia/packages/CUDA/Tl08O/src/compiler/compilation.jl:253
 [15] actual_compilation(cache::Dict{Any, CuFunction}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, compiler::typeof(CUDA.compile), linker::typeof(CUDA.link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/Y4hSX/src/execution.jl:237
 [16] cached_compilation(cache::Dict{Any, CuFunction}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, compiler::Function, linker::Function)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/Y4hSX/src/execution.jl:151
 [17] macro expansion
    @ ~/.julia/packages/CUDA/Tl08O/src/compiler/execution.jl:369 [inlined]
 [18] macro expansion
    @ ./lock.jl:267 [inlined]
 [19] cufunction(f::typeof(ClimaCoreCUDAExt.copyto_stencil_kernel!), tt::Type{Tuple{ClimaCore.Fields.Field{…}, ClimaCore.Operators.StencilBroadcasted{…}, ClimaCore.Spaces.ExtrudedFiniteDifferenceSpace{…}, NTuple{…}, ClimaCore.DataLayouts.UniversalSize{…}}}; kwargs::@Kwargs{always_inline::Bool})
    @ CUDA ~/.julia/packages/CUDA/Tl08O/src/compiler/execution.jl:364
 [20] cufunction
    @ ~/.julia/packages/CUDA/Tl08O/src/compiler/execution.jl:361 [inlined]
 [21] macro expansion
    @ ~/.julia/packages/CUDA/Tl08O/src/compiler/execution.jl:112 [inlined]
 [22] threads_via_occupancy(f!::typeof(ClimaCoreCUDAExt.copyto_stencil_kernel!), args::Tuple{ClimaCore.Fields.Field{…}, ClimaCore.Operators.StencilBroadcasted{…}, ClimaCore.Spaces.ExtrudedFiniteDifferenceSpace{…}, NTuple{…}, ClimaCore.DataLayouts.UniversalSize{…}})
    @ ClimaCoreCUDAExt ~/.julia/packages/ClimaCore/wHC4M/ext/cuda/cuda_utils.jl:94
 [23] 
copyto!(out::ClimaCore.Fields.Field{ClimaCore.DataLayouts.VIJFH{…}, ClimaCore.Spaces.ExtrudedFiniteDifferenceSpace{…}}, bc::ClimaCore.Operators.StencilBroadcasted{ClimaCoreCUDAExt.CUDAColumnStencilStyle, ClimaCore.Operators.GradientF2C{…}, Tuple{…}, ClimaCore.Spaces.ExtrudedFiniteDifferenceSpace{…}})
    @ ClimaCoreCUDAExt ~/.julia/packages/ClimaCore/wHC4M/ext/cuda/operators_finite_difference.jl:27
 [24] copy
    @ ~/.julia/packages/ClimaCore/wHC4M/src/Operators/common.jl:52 [inlined]
 [25] materialize(opbc::ClimaCore.Operators.StencilBroadcasted{ClimaCore.Operators.StencilStyle, ClimaCore.Operators.GradientF2C{@NamedTuple{}}, Tuple{Base.Broadcast.Broadcasted{ClimaCore.Fields.FieldStyle{}, Nothing, typeof(ClimaCore.RecursiveApply.radd), Tuple{}}}, Nothing})
    @ ClimaCore.Operators ~/.julia/packages/ClimaCore/wHC4M/src/Operators/common.jl:57
 [26] top-level scope
    @ REPL[6]:1
Some type information was truncated. Use `show(err)` to see complete types.

To Reproduce

See reproducer:
import Pkg
Pkg.activate("benchmarks/3d")  # run from `ClimaCore.jl` root
import ClimaCore: Domains, Meshes, Geometry, Grids, Spaces, Topologies, Hypsography, Fields, Operators
import ClimaComms

# Request a GPU device on hpc:
# srun -G1 -t 01:00:00 --partition gpu --pty bash -l
ENV["CLIMACOMMS_DEVICE"] = "CUDA"
using CUDA
comms_ctx = ClimaComms.SingletonCommsContext()

h_domain = Domains.RectangleDomain(
    Domains.IntervalDomain(Geometry.XPoint(0.0), Geometry.XPoint(1.0); periodic = true), 
    Domains.IntervalDomain(Geometry.YPoint(0.0), Geometry.YPoint(1.0); periodic = true)
)
h_mesh = Meshes.RectilinearMesh(h_domain, 10, 10)
h_grid = Spaces.grid(Spaces.SpectralElementSpace2D(
    Topologies.DistributedTopology2D(comms_ctx, h_mesh, Topologies.spacefillingcurve(h_mesh)),
    Spaces.Quadratures.GLL{4}(),
))
z_domain = Domains.IntervalDomain(Geometry.ZPoint(0.0), Geometry.ZPoint(1.0); boundary_names = (:bottom, :top))
z_grid = Grids.FiniteDifferenceGrid(Topologies.IntervalTopology(
    comms_ctx, Meshes.IntervalMesh(z_domain, Meshes.Uniform(); nelems = 10)
))
grid = Grids.ExtrudedFiniteDifferenceGrid(h_grid, z_grid, Hypsography.Flat(); deep = false)
center_space = Spaces.CenterExtrudedFiniteDifferenceSpace(grid)
face_space = Spaces.FaceExtrudedFiniteDifferenceSpace(grid)

# Create fields and show that it fails
ᶜgradᵥ = Operators.GradientF2C()

level_field = Fields.level(Fields.Field(Float64, center_space), 1)
ᶠscalar_field = Fields.Field(Float64, face_space)
# Does not work:
@. ᶜgradᵥ(level_field + ᶠscalar_field)

PS: I'm using the benchmarks/3d environment for convenience since it has both ClimaCore and CUDA available.

System details

Any relevant system information:

versioninfo()
# using climacommon/2024_05_27  
julia> versioninfo()
Julia Version 1.10.3
Commit 0b4590a5507 (2024-04-30 10:59 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 28 × Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, broadwell)
Threads: 1 default, 0 interactive, 1 GC (on 28 virtual cores)
Environment:
  JULIA_MPI_HAS_CUDA = true
  JULIA_CPU_TARGET = generic;broadwell;skylake-avx512;icelake-server;cascadelake;znver3
  JULIA_CUDA_MEMORY_POOL = none
  JULIA_LOAD_PATH = :/groups/esm/modules/julia-preferences/_2024_02_20
  JULIA_CUDA_USE_BINARYBUILDER = false
  JULIA_NUM_THREADS = auto
@haakon-e haakon-e added the bug Something isn't working label Sep 14, 2024
@haakon-e
Copy link
Member Author

cc: @dennisYatunin

haakon-e added a commit to CliMA/ClimaAtmos.jl that referenced this issue Sep 16, 2024
Need to compute flux before `ᶜdivᵥ` until we
resolve:
CliMA/ClimaCore.jl#1989
@charleskawczynski
Copy link
Member

@haakon-e, could you please update the reproducer? I think that the existing one has a bug. In particular:

level_field = Fields.level(Fields.Field(Float64, center_space), 1)
ᶠscalar_field = Fields.Field(Float64, face_space)
# Does not work:
@. ᶜgradᵥ(level_field + ᶠscalar_field)

level_field + ᶠscalar_field is attempting to add fields on two different spaces, which is (purposefully) not allowed. I assume that this is meant to be (and we still need import ClimaCore: Utilities above):

level_field = Fields.level(Fields.Field(Float64, face_space), Utilities.half)
ᶠscalar_field = Fields.Field(Float64, face_space)
# Does not work:
@. ᶜgradᵥ(level_field + ᶠscalar_field)

@charleskawczynski
Copy link
Member

charleskawczynski commented Sep 19, 2024

I think the correct reproducer (as I expect) is:

import ClimaCore: Domains, Meshes, Geometry, Grids, Spaces, Topologies, Hypsography, Fields, Operators, Utilities
import ClimaComms

ENV["CLIMACOMMS_DEVICE"] = "CUDA"
using CUDA
# ENV["CLIMACOMMS_DEVICE"] = "CPU"
comms_ctx = ClimaComms.SingletonCommsContext()

h_domain = Domains.RectangleDomain(
    Domains.IntervalDomain(Geometry.XPoint(0.0), Geometry.XPoint(1.0); periodic = true), 
    Domains.IntervalDomain(Geometry.YPoint(0.0), Geometry.YPoint(1.0); periodic = true)
)
h_mesh = Meshes.RectilinearMesh(h_domain, 10, 10)
h_grid = Spaces.grid(Spaces.SpectralElementSpace2D(
    Topologies.DistributedTopology2D(comms_ctx, h_mesh, Topologies.spacefillingcurve(h_mesh)),
    Spaces.Quadratures.GLL{4}(),
))
z_domain = Domains.IntervalDomain(Geometry.ZPoint(0.0), Geometry.ZPoint(1.0); boundary_names = (:bottom, :top))
z_grid = Grids.FiniteDifferenceGrid(Topologies.IntervalTopology(
    comms_ctx, Meshes.IntervalMesh(z_domain, Meshes.Uniform(); nelems = 10)
))
grid = Grids.ExtrudedFiniteDifferenceGrid(h_grid, z_grid, Hypsography.Flat(); deep = false)
center_space = Spaces.CenterExtrudedFiniteDifferenceSpace(grid)
face_space = Spaces.FaceExtrudedFiniteDifferenceSpace(grid)

# Create fields and show that it fails
ᶜgradᵥ = Operators.GradientF2C()

level_field = Fields.level(Fields.Field(Float64, face_space), Utilities.half)
ᶠscalar_field = Fields.Field(Float64, face_space)
# Does not work:
@. ᶜgradᵥ(level_field + ᶠscalar_field)

import LazyBroadcast as LB
bc = LB.@lazy @. ᶜgradᵥ(level_field + ᶠscalar_field);

Alarmingly, neither of these break on the CPU!

@dennisYatunin
Copy link
Member

Some progress from earlier today:

  • The following method overwrite fixes the issue:
julia> @inline Operators.reconstruct_placeholder_space(::Operators.LevelPlaceholderSpace, parent_space) =
    Spaces.SpectralElementSpace2D(Spaces.level(Spaces.grid(parent_space), Operators.PlusHalf(0)))

julia> @. ᶜgradᵥ(level_field + ᶠscalar_field)
ClimaCore.Geometry.Covariant3Vector{Float64}-valued Field:
  components: 
    data: 
      1: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0    0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
  • The following method overwrite does not fix the issue:
julia> @inline Operators.reconstruct_placeholder_space(::Operators.LevelPlaceholderSpace, parent_space) =
    Spaces.level(parent_space, Operators.PlusHalf(0))

julia> @. ᶜgradᵥ(level_field + ᶠscalar_field)
ERROR: InvalidIRError: compiling MethodInstance for ClimaCoreCUDAExt.copyto_stencil_kernel!(::ClimaCore.Fields.Field{…}, ::ClimaCore.Operators.StencilBroadcasted{…}, ::ClimaCore.Spaces.ExtrudedFiniteDifferenceSpace{…}, ::NTuple{…}, ::ClimaCore.DataLayouts.UniversalSize{…}) resulted in invalid LLVM IR
Reason: unsupported call to an unknown function (call to julia.new_gc_frame)
Reason: unsupported call to an unknown function (call to julia.push_gc_frame)
Reason: unsupported call to an unknown function (call to julia.get_gc_frame_slot)
Reason: unsupported dynamic function invocation (call to level)
Stacktrace:
 [1] copyto_stencil_kernel!
   @ /central/home/dyatunin/ClimaCore.jl/ext/cuda/operators_finite_difference.jl:0
Reason: unsupported call to an unknown function (call to julia.pop_gc_frame)
Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erronous code with Cthulhu.jl

We can conclude that the inference problem on GPUs is occurring during dispatch of

level(space::FaceExtrudedFiniteDifferenceSpace3D, v::PlusHalf) =
    SpectralElementSpace2D(level(grid(space), v))

This might be happening because the FaceExtrudedFiniteDifferenceSpace3D type is too complex for the GPU compiler. It should be easy to refactor this so that it no longer dispatches over such complicated types; e.g., we could dispatch over Spaces.staggering(space) instead of the space itself.

@charleskawczynski
Copy link
Member

Note for future self (and @dennisYatunin). One thing I noticed is that we don't always call Base.Broadcast.instantiate for our datalayout kernels (we sometimes assume that it's been called by the field layer). Doing

import LazyBroadcast as LB
bc = LB.@lazy @. ᶜgradᵥ(level_field + ᶠscalar_field);
materialize(bc)

reproduces the error above (w.r.t. inference failure in level), and

import LazyBroadcast as LB
bc = LB.@lazy @. ᶜgradᵥ(level_field + ᶠscalar_field);
bc = Base.Broadcast.instantiate(bc);
materialize(bc)

yields a different error (unmatched spaces). It may be that this doesn't fix the issue, but I think that we may want to fix this first.

@charleskawczynski
Copy link
Member

Actually, that last comment was an issue with LazyBroadcast, but is now fixed in the latest version.

@charleskawczynski
Copy link
Member

Here is an updated reproducer:

import ClimaCore: Spaces, Fields, Operators, Utilities
import ClimaComms
using ClimaCore.CommonSpaces
cspace = ExtrudedCubedSphereSpace(;
    z_elem = 10,
    z_min = 0,
    z_max = 1,
    radius = 10,
    h_elem = 10,
    n_quad_points = 4,
    staggering = CellCenter()
)
fspace = Spaces.face_space(cspace);

# Create fields
ᶜgradᵥ = Operators.GradientF2C()
ᶠscalar_field = Fields.Field(Float64, fspace)
ᶜlevel_field = Fields.level(Fields.Field(Float64, cspace), 1)
ᶠlevel_field = Fields.level(Fields.Field(Float64, fspace), Utilities.half)
level_values = Fields.field_values(ᶜlevel_field)

# Should error, but doesn't on the cpu (we cannot add center and face spaces):
@. ᶜlevel_field + ᶠscalar_field
@. ᶜgradᵥ(ᶜlevel_field + ᶠscalar_field)

# Should work, and does, on the cpu
@. ᶠlevel_field + ᶠscalar_field
@. ᶜgradᵥ(ᶠlevel_field + ᶠscalar_field)

# Should work, but doesn't
@. level_values + ᶠscalar_field
@. ᶜgradᵥ(level_values + ᶠscalar_field)

@charleskawczynski charleskawczynski added Design question bug Something isn't working and removed bug Something isn't working labels Jan 27, 2025
@charleskawczynski
Copy link
Member

Of the 3 cases in the reproducer:

    1. @. ᶜlevel_field + ᶠscalar_field should certainly fail (bug)
    1. @. ᶠlevel_field + ᶠscalar_field perhaps should work? I think that this is actually a design question, because we could require that users extract the field values
    1. @. level_values + ᶠscalar_field should probably work (support needed)

@charleskawczynski charleskawczynski added the feature New feature label Jan 27, 2025
@haakon-e
Copy link
Member Author

For i. & i.., if either lead to errors, it would be useful with a short hint for what users should do. E.g. apply a transformation, extract values etc., with reference to a suggested method.

@dennisYatunin dennisYatunin linked a pull request Feb 4, 2025 that will close this issue
4 tasks
@dennisYatunin dennisYatunin linked a pull request Feb 4, 2025 that will close this issue
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Design question feature New feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants