-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compiling functions that take tuples of Fields with Spaces inside is unreasonably expensive #1467
Comments
Note: this is not fixed by using newer versions of Julia. With Julia 1.11-dev
Timing information shows that for tuples, 74 % of the time is spent in the |
Yeah, compile times are large due to our cache. IIUC, we should be able to reduce compile times (latency only), by replacing this with, for example, a Dict. I tried replacing the NamedTuple with a Dict-backed struct here: CliMA/ClimaAtmos.jl@cf2865d, but it didn't really improve anything, and runtime performance suffered pretty badly since Dicts don't track its members types. Most notably, the compile times did not improve. Considering our cache is quite flat, maybe the issue is actually due to FieldVectors or Fields themselves? |
I suspect this is due to the large tuple handling. Do you know if there is a particular size at which this kicks in? |
It would also be worth seeing if making Spaces mutable would help (since that might induce less recursion) |
Compile time for an array is constant up to 100 elements. Also, for 1 single element, compiling with a tuple is three times more expensive than compiling with an array. |
Excellent intuition, making the Now, with 100 elements:
I am checking how much it improves latency in |
Old time to run the above mentioned case: 1043 seconds. This one change reduced latency by 390 seconds. It also works by making the |
Below is shown a simple script that proves that there is a significant performance issue introduced by using (named) tuples of fields that have space information.
The script sets up a space, and a constant field
Y
defined on this space. Then, it defines a trivial functionmytest(p) = p
. Forp
, I test four difference choices:p_named_tuple
, which is aNamedTuple
with 44 references toY
(p = (; a0 = Y, a1 = Y, ....)
)p_tuple
, a tuple with 44 references toY
p_array
, an array with 44 references toY
p_struct
, an instance of a struct that contains 44 fields. In this case, all references toY
The script shows that compile time is significantly worse when using tuples:
Following intuition from @simonbyrne, I checked what would happen if we didn't have space information. So, I created a new
Y_array
, where we don't have space information. In this case:The increase in compile time is non-linear. If we double the number of entries, the compile time for tuples grows to
This problem has profound implications for latency in
ClimaAtmos
since the cache is a massive named tuple. Every function that calls the cache, no matter how trivial it is, can introduce several seconds of unnecessary latency.I have a rough test implementation where I make the cache a struct instead of a named tuple.
For a simple moist case without radiation, that following code compiles 60% faster on my branch:
Given that the cache is a complex and rich object, and recreating its structure in a struct is not an easy task, my branch currently fails when solving the equations.
I think that there's potential to reduce latency in
ClimaAtmos
by a factor of 1.5-2 by addressing this issue.The text was updated successfully, but these errors were encountered: