Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to model performance by reducing allocations #871

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

ConnectedSystems
Copy link
Collaborator

@ConnectedSystems ConnectedSystems commented Oct 4, 2024

A swathe of changes to the simulations to improve runtime performance, principally by reducing the volume of allocations.

Includes purpose-specific weighted sum function for use when determining DHW tolerance thresholds.

Before:
image

After:
image

Error between approaches:
image

using Revise, Infiltrator

using ADRIA

using Statistics
using Serialization


RME_DOM = "C:/Users/tiwanaga/development/RME/rme_ml_2024_01_08"
gbr_dom = ADRIA.load_domain(RMEDomain, RME_DOM, "45")

global debug_c = 1
scen = ADRIA.param_table(gbr_dom)
rs = ADRIA.run_model(gbr_dom, scen[1, :]);

cover = dropdims(sum(rs.raw, dims=2), dims=2)
# serialize("cover_optimized.dat", cover)
# serialize("cover_unoptimized.dat", cover)

Error calculation was determined by serializing raw results to disk and then comparing the absolute maximum difference:

# Comparison

cover_opt = deserialize("cover_optimized.dat")
cover_unopt = deserialize("cover_unoptimized.dat")

maximum(abs.(cover_opt .- cover_unopt))

There are further performance optimizations potentially possible within CoralBlox and elsewhere in ADRIA, but seems to be mostly around determining strategies to avoid the GC.

We could even see if it is possible to make key functions completely allocation free and turning off the GC for those...

Update:

After some more tweaks

image

Use `view()` directly to avoid a tiny bit more allocations, particularly when only one variable is being indexed

Preallocate variables where possible.
Weighted sum now implemented directly in parent function
@ConnectedSystems ConnectedSystems added the enhancement New feature or request label Oct 4, 2024
@ConnectedSystems ConnectedSystems marked this pull request as draft October 4, 2024 01:34
No real difference in performance and the older code was more readable anyway.
@ConnectedSystems ConnectedSystems marked this pull request as ready for review October 4, 2024 02:11
Copy link
Collaborator

@Zapiano Zapiano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just two questions

for i in length(growth_rate):-1:2
# Skip size class if nothing is moving up
sum(@view(cover[(i - 1):i])) == 0.0 ? continue : false
sum(view(cover, (i - 1):i)) == 0.0 ? continue : false
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any advantage of using view instead of the macro @view here? (I'm asking because the macro seems easier to read to me)

Copy link
Collaborator Author

@ConnectedSystems ConnectedSystems Oct 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I'm not sure which is better between view() and @view() (from what I understand the difference should be minimal?) but they both beat @views, at least when slicing a single matrix (size: 79x3806)

view() @view() @views
image image image

https://www.juliabloggers.com/the-view-and-views-macros-are-you-sure-you-know-how-they-work/

https://discourse.julialang.org/t/difference-between-view-and-view/40798

w_per_group = w ./ sum(w; dims=1)
replace!(w_per_group, NaN => 0.0)
# Calculate contribution to cover to determine weights for each functional group
w::Matrix{Float64} = sink_settlers .* view(tp.data, source_locs, sink_loc)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make a relevant difference to cache this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tricky thing is the number of sources can change.

But maybe there is a way of preallocating and then resizing. Maybe it will avoid excessive triggering of the GC... let me have a think.

Copy link
Collaborator Author

@ConnectedSystems ConnectedSystems Oct 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't think of a way to cache the weights matrix because the number of source locations change as I said, and making the cache matrix dynamic resize involved a few more intermediate steps which would increase overall allocations (at least the way I was thinking of it).

I did play with the idea that we can cache the result outside the function given the number of source locations are constant, but this is only true in our current case where we use mean connectivity. If/when we move to variable/dynamic connectivity data, then we'd need this implementation anyway.

I did think of some other tweaks which helped a tiny bit though:

image

The performance difference is effectively null compared to `view()` so switching for readability.
Pre-allocate arrays outside function definition.
I'm not sure if this actually reduces allocations on the whole, but defining caches as part of the function signature seems to trip up the profiler.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants