Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Much of our overhead comes from doing computational work in other libraries (pandas, arrow, ...) that could be cached. We do cache a lot of stuff today, but we store these caches on the object itself. When we then recreate objects (for example in optimization) then we lose those caches.
One solution here is to cache the objects themselves, so that
Op(...) is Op(...)
. This technique is a bit magical, but is used in other projects like SymPy where it has had good performance impacts (although they use it because they make many more very small objects).Maybe this isn't relevant for us. Ideally we wouldn't recreate objects often in optimization (this is why we return the original object if arguments match). But maybe it's hard to be careful. If so, this might provide a bit of a sledgehammer approach.
THis isn't done yet, in particular there are open questions about non-hashable inputs like pandas dataframes. Hopefully it is a useful proof of concept.