-
-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add caching to recursive simplify_once
calls
#797
Conversation
dask_expr/_core.py
Outdated
# Check if we've already simplified for these dependents | ||
key = _tokenize_deterministic(sorted(dependents.keys())) | ||
if key in self._simplified: | ||
return self._simplified[key] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this change anything for you? I tried the same thing and it made things actually even slower
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. Yes, it certainly speed things up a lot for me, but I only tried Q1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, sorry. I had a bug in my cache :)
After profiling this I am confused why this kind of cache would help us anything. Everything that lights up the profile is essentially a computation of How is this cache helping with this? I do indeed see fewer intermediates but I don't understand how this changes anything about the performance. I ran a test where I keep the cache but never access the element diff --git a/dask_expr/_core.py b/dask_expr/_core.py
index 3388694..86ae404 100644
--- a/dask_expr/_core.py
+++ b/dask_expr/_core.py
@@ -282,8 +282,8 @@ class Expr:
"""
# Check if we've already simplified for these dependents
key = _tokenize_deterministic(sorted(dependents.keys()))
- if key in self._simplified:
- return self._simplified[key]
+ # if key in self._simplified:
+ # return self._simplified[key]
expr = self
@@ -315,6 +315,8 @@ class Expr:
if isinstance(operand, Expr):
new = operand.simplify_once(dependents=dependents)
if new._name != operand._name:
+ if key in self._simplified:
+ print("Already simplified but changed")
changed = True
else:
new = operand and indeed... I get a lot of "Already simplified but changed" prints so even though the expression already passed once through this, there is still more work to do apparently. This cache prohibits that work which is why it's so much faster. However, will this yield the same results? |
@fjetter - Thanks for looking into this.
Right, my initial assumption here was that the new "dependents-informed" optimization loop was simply re-generating expressions that it didn't actually need to regenerate, and was therefore regenerating I'll admit that I didn't make a solid attempt to poke holes in my quick/simple "fix" yet. However, my assumption is that if our |
I think this cache must only be populated if the expr is indeed identical in the end, so diff --git a/dask_expr/_core.py b/dask_expr/_core.py
index 3388694..b3f70a7 100644
--- a/dask_expr/_core.py
+++ b/dask_expr/_core.py
@@ -324,8 +324,8 @@ class Expr:
expr = type(expr)(*new_operands)
break
-
- self._simplified[key] = expr # Cache the result
+ if expr is self:
+ self._simplified[key] = expr # Cache the result
return expr
def simplify(self) -> Expr: with this I still get almost 3k cache hits you could also check the name but you'd have to check if the expression is not modified. I think an equivalent would be if expr._name == key:
self._simplified[key] = expr but we can't populate the cache immediately. |
I tried the caching approach in #798 as well and all changes combined makes everything pleasantly fast. |
🎉
…On Tue, Jan 23, 2024 at 11:30 AM Florian Jetter ***@***.***> wrote:
I tried the caching approach in #798
<#798> as well and all
changes combined makes everything pleasantly fast.
Over there, I am using an attribute that's set on the expression that
remembers whether an expression is fully simplified or not. This feels less
invasive than a mapping on every instance
—
Reply to this email directly, view it on GitHub
<#797 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTBFOS2N7KFSNKEVBEDYP7XSDAVCNFSM6AAAAABCFUTJHCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBWGU3TCMBRGA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
I do like the feel of that better, but tests are failing. I think you really do need to keep track of whether the expression was simplified for the specific |
Expr
simplify_once
calls
I know this PR wasn't the best solution in its original form. However, I think it should be a reasonable solution for #796 in its revised form. The optimization time becomes negligible (<0.5s) for Q1 on my machine (a ~10x improvement). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I intended to do the same, this makes sense performance wise
My other PR is still worthwhile btw, would appreciate if you could take a look there #842
thx |
The
simplify
logic currently collects adependents
dictionary, and then recursively callssimplify_once
on allExpr
objects in the expression graph. This PR adds a simple caching mechanism to avoid unnecessary repetition of the same logic (which is particularly problematic for column-name reassignment).Note that an earlier version of this PR stored the cache in the
Expr
instance itself. The current approach is a bit lighter weight, but still provides a 10x performance boost for TPCh Q1 optimization.Closes #796
Possible mitigation for #796 - There may be better ways to implement/achieve the caching we need long term, but this seems to work reasonably well.With this PR, the optimization stage of TPCh Q1 is still slightly slower than it was before #395, but the overall runtime is back to something reasonable (down from roughly 26s to 10s on my machine)