Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restructure ThermodynamicsParameters to reduce metadata size #233

Open
1 task
sriharshakandala opened this issue Oct 29, 2024 · 9 comments
Open
1 task
Labels

Comments

@sriharshakandala
Copy link
Member

sriharshakandala commented Oct 29, 2024

The Climate Modeling Alliance

Software Design Issue 📜

Purpose

Restructure ThermodynamicsParameters to reduce metadata size.

Cost/Benefits/Risks

This will help reduce metadata size for GPU kernels

People and Personnel

Components

Restructure the ThermodynamicsParameters to

struct ThermodynamicsParameters{D} <: AbstractThermodynamicsParameters
    params::D
end
Adapt.@adapt_structure ThermodynamicsParameters

and access parameters through functions. E.g.:

T_0(tp::ThermodynamicsParameters) = tp.params[1]
MSLP(tp::ThermodynamicsParameters) = tp.params[2]
...

or

get_T_0(tp::ThermodynamicsParameters) = tp.params[1]
get_MSLP(tp::ThermodynamicsParameters) = tp.params[2]
...

Inputs

Results and Deliverables

Scope of Work

Tasks

SDI Revision Log

CC

@tapios @sriharshakandala @charleskawczynski @cmbengue

@charleskawczynski
Copy link
Member

I think that this is fine for large, flat, and "uniform" structs. Thermodynamics does have 26 properties.

Unfortunately, this isn't really a compose-able solution (as mentioned here). So, if I understand correctly, applying this to ThermodynamicsParameters and ClimaAtmosParameters could break, or require using GC.@preserve, which really doesn't seem like a good option, since these kernels are launched in ClimaCore (where we don't have access to the ThermodynamicsParameters type).

I think that we will need to test a few things out in in ClimaAtmos before merging anything.

@sriharshakandala
Copy link
Member Author

sriharshakandala commented Oct 29, 2024

The solution will work for nested structs:

struct S1{D}
    params::D
end

get_p1(s::S1) = s.params[1]
get_p2(s::S1) = s.params[2]
get_p3(s::S1) = s.params[3]

struct S2{D}
    params::D
end

get_p1(s::S2) = s.params[1]
get_p2(s::S2) = s.params[2]
get_p3(s::S2) = s.params[3]

struct S{A, B}
    s1::A
    s2::B
end

s1 = S1([1, 2, 3, 4, 5])
s2 = S2([6, 7, 8, 9, 10])

s = S(s1, s2)

@show get_p2(s1)
@show get_p2(s.s1)
@show get_p2(s2)
@show get_p2(s.s2)
get_p2(s1) = 2
get_p2(s.s1) = 2
get_p2(s2) = 7
get_p2(s.s2) = 7

@trontrytel
Copy link
Member

Is that something we will have to do in CloudMicrophysics.jl too? We have a lot of parameters and spend some significant time to introduce structure to them. We dispatch a lot of things now based on parameter types. Are we now switching back to all flat?

@glwagner
Copy link
Member

How much parameter space does the current struct use?

We run into parameter space issues regularly with Oceananigans and ClimaOcean. But we find that the issue is usually highly nested structures of things like OffsetArrays, each of which requires parameters for the offsets (or worse, Oceananigans.Field when that must be passed to GPU). On the other hand, for thermodynamics parameters (of which ClimaOcean has an implementation using AbstractThermodynamicsParameters), the parameter space is quite small --- consuming only 56 bytes of the available ~4k in our case. (Note that we have a nested hierarchical implementation of the thermodynamics parameters.) See this issue:

CliMA/ClimaOcean.jl#116

@sriharshakandala
Copy link
Member Author

The above design provides a bounded metadata size of about 32 bytes, regardless of number of simple fields in struct. This pattern is prevalent in our parameter structs in various packages!

@glwagner
Copy link
Member

The above design provides a bounded metadata size of about 32 bytes, regardless of number of simple fields in struct. This pattern is prevalent in our parameter structs in various packages!

That's great. What is the parameter size without this change? (Is the parameter size variable? Why would that be?)

@charleskawczynski
Copy link
Member

charleskawczynski commented Oct 29, 2024

I agree with @glwagner' points about this, we should probably identify the biggest culprits w.r.t. parameter memory usage, before widely spreading a pattern that reduces our parameter memory by only (for example) 2%.

That's great. What is the parameter size without this change? (Is the parameter size variable? Why would that be?)

It's 8 bytes per property, so, 208 bytes for Thermodynamics.

@glwagner
Copy link
Member

glwagner commented Oct 29, 2024

  • Reduce number size for certain objects (eg use Float32 for parameters, or smaller sized integers for numbers with known limited ranges)

The concern is mostly about metadata size!

@charleskawczynski
Copy link
Member

I added support for @rprint_parameter_memory, which recursively prints the parameter memory (cc @glwagner, who may be interested in using this). The output looks like:

fmb
  | size: 72, fmb.pairs::Tuple{…}
  | size: 16, fmb.pairs.1::Pair{…}
  | size: 64, fmb.pairs.1.first::CUDA.CuArray{…}
  | size: 16, fmb.pairs.1.first.data::GPUArrays.DataRef{…}
  | size: 24, fmb.pairs.1.first.data.rc::GPUArrays.RefCounted{…}
  | size: 64, fmb.pairs.1.first.data.rc.obj::CUDA.Managed{…}
  | size: 48, fmb.pairs.1.first.data.rc.obj.mem::CUDA.DeviceMemory
  | size: 16, fmb.pairs.1.first.data.rc.obj.mem.ctx::CUDA.CuContext
  | size: 40, fmb.pairs.1.first.data.rc.obj.stream::CUDA.CuStream
  | size: 16, fmb.pairs.1.first.data.rc.obj.stream.ctx::CUDA.CuContext
  | size: 40, fmb.pairs.1.first.dims::NTuple{…}
  | size: 64, fmb.pairs.1.second::Base.Broadcast.Broadcasted{…}
  | size: 64, fmb.pairs.1.second.args::Tuple{…}
  | size: 64, fmb.pairs.1.second.args.1::CUDA.CuArray{…}
  | size: 16, fmb.pairs.1.second.args.1.data::GPUArrays.DataRef{…}
  | size: 24, fmb.pairs.1.second.args.1.data.rc::GPUArrays.RefCounted{…}
  | size: 64, fmb.pairs.1.second.args.1.data.rc.obj::CUDA.Managed{…}
  | size: 48, fmb.pairs.1.second.args.1.data.rc.obj.mem::CUDA.DeviceMemory
  | size: 16, fmb.pairs.1.second.args.1.data.rc.obj.mem.ctx::CUDA.CuContext
  | size: 40, fmb.pairs.1.second.args.1.data.rc.obj.stream::CUDA.CuStream
  | size: 16, fmb.pairs.1.second.args.1.data.rc.obj.stream.ctx::CUDA.CuContext
  | size: 40, fmb.pairs.1.second.args.1.dims::NTuple{…}
  | size: 24, fmb.pairs.2::Pair{…}
  | size: 64, fmb.pairs.2.first::CUDA.CuArray{…}
  | size: 16, fmb.pairs.2.first.data::GPUArrays.DataRef{…}
  | size: 24, fmb.pairs.2.first.data.rc::GPUArrays.RefCounted{…}
  | size: 64, fmb.pairs.2.first.data.rc.obj::CUDA.Managed{…}
  | size: 48, fmb.pairs.2.first.data.rc.obj.mem::CUDA.DeviceMemory
  | size: 16, fmb.pairs.2.first.data.rc.obj.mem.ctx::CUDA.CuContext
  | size: 40, fmb.pairs.2.first.data.rc.obj.stream::CUDA.CuStream
  | size: 16, fmb.pairs.2.first.data.rc.obj.stream.ctx::CUDA.CuContext
  | size: 40, fmb.pairs.2.first.dims::NTuple{…}
  | size: 128, fmb.pairs.2.second::Base.Broadcast.Broadcasted{…}
  | size: 128, fmb.pairs.2.second.args::Tuple{…}
  | size: 64, fmb.pairs.2.second.args.1::CUDA.CuArray{…}
  | size: 16, fmb.pairs.2.second.args.1.data::GPUArrays.DataRef{…}
  | size: 24, fmb.pairs.2.second.args.1.data.rc::GPUArrays.RefCounted{…}
  | size: 64, fmb.pairs.2.second.args.1.data.rc.obj::CUDA.Managed{…}
  | size: 48, fmb.pairs.2.second.args.1.data.rc.obj.mem::CUDA.DeviceMemory
  | size: 16, fmb.pairs.2.second.args.1.data.rc.obj.mem.ctx::CUDA.CuContext
  | size: 40, fmb.pairs.2.second.args.1.data.rc.obj.stream::CUDA.CuStream
  | size: 16, fmb.pairs.2.second.args.1.data.rc.obj.stream.ctx::CUDA.CuContext
  | size: 40, fmb.pairs.2.second.args.1.dims::NTuple{…}
  | size: 64, fmb.pairs.2.second.args.2::CUDA.CuArray{…}
  | size: 16, fmb.pairs.2.second.args.2.data::GPUArrays.DataRef{…}
  | size: 24, fmb.pairs.2.second.args.2.data.rc::GPUArrays.RefCounted{…}
  | size: 64, fmb.pairs.2.second.args.2.data.rc.obj::CUDA.Managed{…}
  | size: 48, fmb.pairs.2.second.args.2.data.rc.obj.mem::CUDA.DeviceMemory
  | size: 16, fmb.pairs.2.second.args.2.data.rc.obj.mem.ctx::CUDA.CuContext
  | size: 40, fmb.pairs.2.second.args.2.data.rc.obj.stream::CUDA.CuStream
  | size: 16, fmb.pairs.2.second.args.2.data.rc.obj.stream.ctx::CUDA.CuContext
  | size: 40, fmb.pairs.2.second.args.2.dims::NTuple{…}
  | size: 32, fmb.pairs.3::Pair{…}
  | size: 64, fmb.pairs.3.first::CUDA.CuArray{…}
  | size: 16, fmb.pairs.3.first.data::GPUArrays.DataRef{…}
  | size: 24, fmb.pairs.3.first.data.rc::GPUArrays.RefCounted{…}
  | size: 64, fmb.pairs.3.first.data.rc.obj::CUDA.Managed{…}
  | size: 48, fmb.pairs.3.first.data.rc.obj.mem::CUDA.DeviceMemory
  | size: 16, fmb.pairs.3.first.data.rc.obj.mem.ctx::CUDA.CuContext
  | size: 40, fmb.pairs.3.first.data.rc.obj.stream::CUDA.CuStream
  | size: 16, fmb.pairs.3.first.data.rc.obj.stream.ctx::CUDA.CuContext
  | size: 40, fmb.pairs.3.first.dims::NTuple{…}
  | size: 192, fmb.pairs.3.second::Base.Broadcast.Broadcasted{…}
  | size: 192, fmb.pairs.3.second.args::Tuple{…}
  | size: 64, fmb.pairs.3.second.args.1::CUDA.CuArray{…}
  | size: 16, fmb.pairs.3.second.args.1.data::GPUArrays.DataRef{…}
  | size: 24, fmb.pairs.3.second.args.1.data.rc::GPUArrays.RefCounted{…}
  | size: 64, fmb.pairs.3.second.args.1.data.rc.obj::CUDA.Managed{…}
  | size: 48, fmb.pairs.3.second.args.1.data.rc.obj.mem::CUDA.DeviceMemory
  | size: 16, fmb.pairs.3.second.args.1.data.rc.obj.mem.ctx::CUDA.CuContext
  | size: 40, fmb.pairs.3.second.args.1.data.rc.obj.stream::CUDA.CuStream
  | size: 16, fmb.pairs.3.second.args.1.data.rc.obj.stream.ctx::CUDA.CuContext
  | size: 40, fmb.pairs.3.second.args.1.dims::NTuple{…}
  | size: 64, fmb.pairs.3.second.args.2::CUDA.CuArray{…}
  | size: 16, fmb.pairs.3.second.args.2.data::GPUArrays.DataRef{…}
  | size: 24, fmb.pairs.3.second.args.2.data.rc::GPUArrays.RefCounted{…}
  | size: 64, fmb.pairs.3.second.args.2.data.rc.obj::CUDA.Managed{…}
  | size: 48, fmb.pairs.3.second.args.2.data.rc.obj.mem::CUDA.DeviceMemory
  | size: 16, fmb.pairs.3.second.args.2.data.rc.obj.mem.ctx::CUDA.CuContext
  | size: 40, fmb.pairs.3.second.args.2.data.rc.obj.stream::CUDA.CuStream
  | size: 16, fmb.pairs.3.second.args.2.data.rc.obj.stream.ctx::CUDA.CuContext
  | size: 40, fmb.pairs.3.second.args.2.dims::NTuple{…}
  | size: 64, fmb.pairs.3.second.args.3::CUDA.CuArray{…}
  | size: 16, fmb.pairs.3.second.args.3.data::GPUArrays.DataRef{…}
  | size: 24, fmb.pairs.3.second.args.3.data.rc::GPUArrays.RefCounted{…}
  | size: 64, fmb.pairs.3.second.args.3.data.rc.obj::CUDA.Managed{…}
  | size: 48, fmb.pairs.3.second.args.3.data.rc.obj.mem::CUDA.DeviceMemory
  | size: 16, fmb.pairs.3.second.args.3.data.rc.obj.mem.ctx::CUDA.CuContext
  | size: 40, fmb.pairs.3.second.args.3.data.rc.obj.stream::CUDA.CuStream
  | size: 16, fmb.pairs.3.second.args.3.data.rc.obj.stream.ctx::CUDA.CuContext
  | size: 40, fmb.pairs.3.second.args.3.dims::NTuple{…}

I plan to open a PR in CUDA with this, but I think it'd be helpful to reduce the amount of information (e.g., sorting and printing the most expensive pieces).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants