Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiling functions that take tuples of Fields with Spaces inside is unreasonably expensive #1467

Closed
Sbozzolo opened this issue Sep 21, 2023 · 8 comments · Fixed by #1487
Closed
Assignees
Labels
bug Something isn't working Latency

Comments

@Sbozzolo
Copy link
Member

Sbozzolo commented Sep 21, 2023

Below is shown a simple script that proves that there is a significant performance issue introduced by using (named) tuples of fields that have space information.

The script sets up a space, and a constant field Y defined on this space. Then, it defines a trivial function mytest(p) = p. For p, I test four difference choices:

  • p_named_tuple, which is a NamedTuple with 44 references to Y (p = (; a0 = Y, a1 = Y, ....))
  • p_tuple, a tuple with 44 references to Y
  • p_array, an array with 44 references to Y
  • p_struct, an instance of a struct that contains 44 fields. In this case, all references to Y

The script shows that compile time is significantly worse when using tuples:

Named tuple:   1.202479 seconds (343 allocations: 24.969 KiB, 100.00% compilation time)
Tuple:   1.190572 seconds (339 allocations: 24.438 KiB, 100.00% compilation time)
Array:   0.002351 seconds (339 allocations: 24.438 KiB, 99.33% compilation time)
Struct:   0.003234 seconds (339 allocations: 24.438 KiB, 99.45% compilation time)

Following intuition from @simonbyrne, I checked what would happen if we didn't have space information. So, I created a new Y_array, where we don't have space information. In this case:

Tuple with Y array:   0.003157 seconds (339 allocations: 24.438 KiB, 99.46% compilation time)

The increase in compile time is non-linear. If we double the number of entries, the compile time for tuples grows to

Named tuple:   3.747541 seconds (346 allocations: 25.016 KiB, 100.00% compilation time)
Tuple:   3.768719 seconds (342 allocations: 24.484 KiB, 100.00% compilation time)

This problem has profound implications for latency in ClimaAtmos since the cache is a massive named tuple. Every function that calls the cache, no matter how trivial it is, can introduce several seconds of unnecessary latency.

I have a rough test implementation where I make the cache a struct instead of a named tuple.

For a simple moist case without radiation, that following code compiles 60% faster on my branch:

import ClimaAtmos as CA
config = CA.AtmosConfig(parsed_args = Dict("config_file" => XXXX)
CA.get_integrator(config)

Given that the cache is a complex and rich object, and recreating its structure in a struct is not an easy task, my branch currently fails when solving the equations.

I think that there's potential to reduce latency in ClimaAtmos by a factor of 1.5-2 by addressing this issue.

@Sbozzolo Sbozzolo added bug Something isn't working performance labels Sep 21, 2023
@Sbozzolo Sbozzolo changed the title Compiling function that take tuples of Fields with Spaces inside is unreasonably expensive Compiling functions that take tuples of Fields with Spaces inside is unreasonably expensive Sep 21, 2023
@Sbozzolo
Copy link
Member Author

Sbozzolo commented Sep 21, 2023

Note: this is not fixed by using newer versions of Julia.

With Julia 1.11-dev

Named tuple:   1.180128 seconds (352 allocations: 25.867 KiB, 100.00% compilation time)
Tuple:   1.181362 seconds (348 allocations: 25.367 KiB, 100.00% compilation time)
Array:   0.002660 seconds (348 allocations: 25.367 KiB, 99.31% compilation time)
Struct:   0.002986 seconds (348 allocations: 25.367 KiB, 99.38% compilation time)
Tuple with Y array:   0.002945 seconds (348 allocations: 25.367 KiB, 99.44% compilation time)

Timing information shows that for tuples, 74 % of the time is spent in the JIT_compile zone.

@charleskawczynski
Copy link
Member

Yeah, compile times are large due to our cache. IIUC, we should be able to reduce compile times (latency only), by replacing this with, for example, a Dict.

I tried replacing the NamedTuple with a Dict-backed struct here: CliMA/ClimaAtmos.jl@cf2865d, but it didn't really improve anything, and runtime performance suffered pretty badly since Dicts don't track its members types. Most notably, the compile times did not improve. Considering our cache is quite flat, maybe the issue is actually due to FieldVectors or Fields themselves?

@simonbyrne
Copy link
Member

I suspect this is due to the large tuple handling. Do you know if there is a particular size at which this kicks in?

@simonbyrne
Copy link
Member

It would also be worth seeing if making Spaces mutable would help (since that might induce less recursion)

@Sbozzolo
Copy link
Member Author

This is what I find:
tup

@Sbozzolo
Copy link
Member Author

Compile time for an array is constant up to 100 elements.

Also, for 1 single element, compiling with a tuple is three times more expensive than compiling with an array.

@Sbozzolo
Copy link
Member Author

It would also be worth seeing if making Spaces mutable would help (since that might induce less recursion)

Excellent intuition, making the Field mutable pretty much solves the problem! 🎉🎉🎉

Now, with 100 elements:

Tuple:
  0.003100 seconds (343 allocations: 24.969 KiB, 99.38% compilation time)
Array:
  0.002260 seconds (339 allocations: 24.438 KiB, 99.34% compilation time)

I am checking how much it improves latency in ClimaAtmos, but I can already see that it is going to be >100 seconds for the sphere_aquaplanet_rhoe_equilmoist_allsky_gw_raw_zonallyasymmetric case.

@Sbozzolo
Copy link
Member Author

Sbozzolo commented Sep 21, 2023

Old time to run the above mentioned case: 1043 seconds.
New time: 653 seconds.

This one change reduced latency by 390 seconds.

It also works by making the Spaces mutable (I don't have a ClimaAtmos benchmark, but it will probably be similar).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Latency
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants