Compiling functions that take tuples of Fields with Spaces inside is unreasonably expensive #1467

Sbozzolo · 2023-09-21T15:47:19Z

Below is shown a simple script that proves that there is a significant performance issue introduced by using (named) tuples of fields that have space information.

The script sets up a space, and a constant field Y defined on this space. Then, it defines a trivial function mytest(p) = p. For p, I test four difference choices:

p_named_tuple, which is a NamedTuple with 44 references to Y (p = (; a0 = Y, a1 = Y, ....))
p_tuple, a tuple with 44 references to Y
p_array, an array with 44 references to Y
p_struct, an instance of a struct that contains 44 fields. In this case, all references to Y

The script shows that compile time is significantly worse when using tuples:

Named tuple:   1.202479 seconds (343 allocations: 24.969 KiB, 100.00% compilation time)
Tuple:   1.190572 seconds (339 allocations: 24.438 KiB, 100.00% compilation time)
Array:   0.002351 seconds (339 allocations: 24.438 KiB, 99.33% compilation time)
Struct:   0.003234 seconds (339 allocations: 24.438 KiB, 99.45% compilation time)

Following intuition from @simonbyrne, I checked what would happen if we didn't have space information. So, I created a new Y_array, where we don't have space information. In this case:

Tuple with Y array:   0.003157 seconds (339 allocations: 24.438 KiB, 99.46% compilation time)

The increase in compile time is non-linear. If we double the number of entries, the compile time for tuples grows to

Named tuple:   3.747541 seconds (346 allocations: 25.016 KiB, 100.00% compilation time)
Tuple:   3.768719 seconds (342 allocations: 24.484 KiB, 100.00% compilation time)

This problem has profound implications for latency in ClimaAtmos since the cache is a massive named tuple. Every function that calls the cache, no matter how trivial it is, can introduce several seconds of unnecessary latency.

I have a rough test implementation where I make the cache a struct instead of a named tuple.

For a simple moist case without radiation, that following code compiles 60% faster on my branch:

import ClimaAtmos as CA
config = CA.AtmosConfig(parsed_args = Dict("config_file" => XXXX)
CA.get_integrator(config)

Given that the cache is a complex and rich object, and recreating its structure in a struct is not an easy task, my branch currently fails when solving the equations.

I think that there's potential to reduce latency in ClimaAtmos by a factor of 1.5-2 by addressing this issue.

The text was updated successfully, but these errors were encountered:

Sbozzolo · 2023-09-21T15:50:01Z

Note: this is not fixed by using newer versions of Julia.

With Julia 1.11-dev

Named tuple:   1.180128 seconds (352 allocations: 25.867 KiB, 100.00% compilation time)
Tuple:   1.181362 seconds (348 allocations: 25.367 KiB, 100.00% compilation time)
Array:   0.002660 seconds (348 allocations: 25.367 KiB, 99.31% compilation time)
Struct:   0.002986 seconds (348 allocations: 25.367 KiB, 99.38% compilation time)
Tuple with Y array:   0.002945 seconds (348 allocations: 25.367 KiB, 99.44% compilation time)

Timing information shows that for tuples, 74 % of the time is spent in the JIT_compile zone.

charleskawczynski · 2023-09-21T16:06:58Z

Yeah, compile times are large due to our cache. IIUC, we should be able to reduce compile times (latency only), by replacing this with, for example, a Dict.

I tried replacing the NamedTuple with a Dict-backed struct here: CliMA/ClimaAtmos.jl@cf2865d, but it didn't really improve anything, and runtime performance suffered pretty badly since Dicts don't track its members types. Most notably, the compile times did not improve. Considering our cache is quite flat, maybe the issue is actually due to FieldVectors or Fields themselves?

simonbyrne · 2023-09-21T16:33:35Z

I suspect this is due to the large tuple handling. Do you know if there is a particular size at which this kicks in?

simonbyrne · 2023-09-21T16:34:47Z

It would also be worth seeing if making Spaces mutable would help (since that might induce less recursion)

Sbozzolo · 2023-09-21T18:24:13Z

This is what I find:

Sbozzolo · 2023-09-21T18:28:13Z

Compile time for an array is constant up to 100 elements.

Also, for 1 single element, compiling with a tuple is three times more expensive than compiling with an array.

Sbozzolo · 2023-09-21T18:52:34Z

It would also be worth seeing if making Spaces mutable would help (since that might induce less recursion)

Excellent intuition, making the Field mutable pretty much solves the problem! 🎉🎉🎉

Now, with 100 elements:

Tuple:
  0.003100 seconds (343 allocations: 24.969 KiB, 99.38% compilation time)
Array:
  0.002260 seconds (339 allocations: 24.438 KiB, 99.34% compilation time)

I am checking how much it improves latency in ClimaAtmos, but I can already see that it is going to be >100 seconds for the sphere_aquaplanet_rhoe_equilmoist_allsky_gw_raw_zonallyasymmetric case.

Sbozzolo · 2023-09-21T20:05:08Z

Old time to run the above mentioned case: 1043 seconds.
New time: 653 seconds.

This one change reduced latency by 390 seconds.

It also works by making the Spaces mutable (I don't have a ClimaAtmos benchmark, but it will probably be similar).

Sbozzolo added bug Something isn't working performance labels Sep 21, 2023

Sbozzolo changed the title ~~Compiling function that take tuples of Fields with Spaces inside is unreasonably expensive~~ Compiling functions that take tuples of Fields with Spaces inside is unreasonably expensive Sep 21, 2023

charleskawczynski added the Latency label Sep 21, 2023

simonbyrne added this to the O1.1.1: 8 SYPD for dry Held Suarez milestone Sep 21, 2023

Sbozzolo removed the performance label Sep 21, 2023

charleskawczynski mentioned this issue Sep 21, 2023

Remove unused fieldvector CliMA/ClimaAtmos.jl#2131

Merged

Sbozzolo mentioned this issue Sep 21, 2023

The cache significantly increases compile time CliMA/ClimaAtmos.jl#2138

Closed

Sbozzolo mentioned this issue Sep 21, 2023

Make Spaces mutable #1470

Closed

simonbyrne mentioned this issue Sep 21, 2023

simplify imex ark CliMA/ClimaTimeSteppers.jl#212

Closed

1 task

simonbyrne mentioned this issue Oct 4, 2023

Add Grids, make them mutable objects #1487

Merged

4 tasks

simonbyrne modified the milestones: O1.1.1: 8 SYPD for dry Held Suarez, O1.1.2: 6 SYPD for moist Held Suarez Oct 12, 2023

simonbyrne self-assigned this Oct 12, 2023

simonbyrne linked a pull request Oct 30, 2023 that will close this issue

Add Grids, make them mutable objects #1487

Merged

4 tasks

simonbyrne closed this as completed in #1487 Nov 16, 2023

charleskawczynski mentioned this issue Jun 28, 2024

Add convenience constructors for grids #1848

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compiling functions that take tuples of Fields with Spaces inside is unreasonably expensive #1467

Compiling functions that take tuples of Fields with Spaces inside is unreasonably expensive #1467

Sbozzolo commented Sep 21, 2023 •

edited

Loading

Sbozzolo commented Sep 21, 2023 •

edited

Loading

charleskawczynski commented Sep 21, 2023

simonbyrne commented Sep 21, 2023

simonbyrne commented Sep 21, 2023

Sbozzolo commented Sep 21, 2023

Sbozzolo commented Sep 21, 2023

Sbozzolo commented Sep 21, 2023

Sbozzolo commented Sep 21, 2023 •

edited

Loading

Compiling functions that take tuples of Fields with Spaces inside is unreasonably expensive #1467

Compiling functions that take tuples of Fields with Spaces inside is unreasonably expensive #1467

Comments

Sbozzolo commented Sep 21, 2023 • edited Loading

Sbozzolo commented Sep 21, 2023 • edited Loading

charleskawczynski commented Sep 21, 2023

simonbyrne commented Sep 21, 2023

simonbyrne commented Sep 21, 2023

Sbozzolo commented Sep 21, 2023

Sbozzolo commented Sep 21, 2023

Sbozzolo commented Sep 21, 2023

Sbozzolo commented Sep 21, 2023 • edited Loading

Sbozzolo commented Sep 21, 2023 •

edited

Loading

Sbozzolo commented Sep 21, 2023 •

edited

Loading

Sbozzolo commented Sep 21, 2023 •

edited

Loading