Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add caching to recursive simplify_once calls #797

Merged
merged 9 commits into from
Feb 6, 2024

Conversation

rjzamora
Copy link
Member

@rjzamora rjzamora commented Jan 22, 2024

The simplify logic currently collects a dependents dictionary, and then recursively calls simplify_once on all Expr objects in the expression graph. This PR adds a simple caching mechanism to avoid unnecessary repetition of the same logic (which is particularly problematic for column-name reassignment).

Note that an earlier version of this PR stored the cache in the Expr instance itself. The current approach is a bit lighter weight, but still provides a 10x performance boost for TPCh Q1 optimization.

Closes #796

Possible mitigation for #796 - There may be better ways to implement/achieve the caching we need long term, but this seems to work reasonably well.

With this PR, the optimization stage of TPCh Q1 is still slightly slower than it was before #395, but the overall runtime is back to something reasonable (down from roughly 26s to 10s on my machine)

Comment on lines 283 to 286
# Check if we've already simplified for these dependents
key = _tokenize_deterministic(sorted(dependents.keys()))
if key in self._simplified:
return self._simplified[key]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this change anything for you? I tried the same thing and it made things actually even slower

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Yes, it certainly speed things up a lot for me, but I only tried Q1.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, sorry. I had a bug in my cache :)

@fjetter
Copy link
Member

fjetter commented Jan 23, 2024

After profiling this I am confused why this kind of cache would help us anything. Everything that lights up the profile is essentially a computation of _meta. Some of them can be optimized (see also #796 (comment)) but the fundamental problem is that we're generating a lot of intermediate expressions and we have to compute _meta for almost all of them which is unfortunately not always trivial.

How is this cache helping with this? I do indeed see fewer intermediates but I don't understand how this changes anything about the performance. I ran a test where I keep the cache but never access the element

diff --git a/dask_expr/_core.py b/dask_expr/_core.py
index 3388694..86ae404 100644
--- a/dask_expr/_core.py
+++ b/dask_expr/_core.py
@@ -282,8 +282,8 @@ class Expr:
         """
         # Check if we've already simplified for these dependents
         key = _tokenize_deterministic(sorted(dependents.keys()))
-        if key in self._simplified:
-            return self._simplified[key]
+        # if key in self._simplified:
+        #     return self._simplified[key]

         expr = self

@@ -315,6 +315,8 @@ class Expr:
                 if isinstance(operand, Expr):
                     new = operand.simplify_once(dependents=dependents)
                     if new._name != operand._name:
+                        if key in self._simplified:
+                            print("Already simplified but changed")
                         changed = True
                 else:
                     new = operand

and indeed... I get a lot of "Already simplified but changed" prints so even though the expression already passed once through this, there is still more work to do apparently. This cache prohibits that work which is why it's so much faster. However, will this yield the same results?

@rjzamora
Copy link
Member Author

rjzamora commented Jan 23, 2024

@fjetter - Thanks for looking into this.

and indeed... I get a lot of "Already simplified but changed" prints so even though the expression already passed once through this, there is still more work to do apparently. This cache prohibits that work which is why it's so much faster. However, will this yield the same results?

Right, my initial assumption here was that the new "dependents-informed" optimization loop was simply re-generating expressions that it didn't actually need to regenerate, and was therefore regenerating _meta that it had already generated before.

I'll admit that I didn't make a solid attempt to poke holes in my quick/simple "fix" yet. However, my assumption is that if our dependents graph has the same nodes, and our expression is the same, then we have already performed that "simplify" logic before. If something about the dependencies changed, then the node names would have changed.

@fjetter
Copy link
Member

fjetter commented Jan 23, 2024

I think this cache must only be populated if the expr is indeed identical in the end, so

diff --git a/dask_expr/_core.py b/dask_expr/_core.py
index 3388694..b3f70a7 100644
--- a/dask_expr/_core.py
+++ b/dask_expr/_core.py
@@ -324,8 +324,8 @@ class Expr:
                 expr = type(expr)(*new_operands)

             break
-
-        self._simplified[key] = expr  # Cache the result
+        if expr is self:
+            self._simplified[key] = expr  # Cache the result
         return expr

     def simplify(self) -> Expr:

with this I still get almost 3k cache hits

you could also check the name but you'd have to check if the expression is not modified. I think an equivalent would be

if expr._name == key:
    self._simplified[key] = expr

but we can't populate the cache immediately.

@fjetter fjetter mentioned this pull request Jan 23, 2024
@fjetter
Copy link
Member

fjetter commented Jan 23, 2024

I tried the caching approach in #798 as well and all changes combined makes everything pleasantly fast.
Over there, I am using an attribute that's set on the expression that remembers whether an expression is fully simplified or not. This feels less invasive than a mapping on every instance

@mrocklin
Copy link
Member

mrocklin commented Jan 23, 2024 via email

@rjzamora
Copy link
Member Author

Over there, I am using an attribute that's set on the expression that remembers whether an expression is fully simplified or not. This feels less invasive than a mapping on every instance

I do like the feel of that better, but tests are failing. I think you really do need to keep track of whether the expression was simplified for the specific dependents in question (not just that the expression was simplified before and it didn't change).

@rjzamora rjzamora changed the title Cache simplified expressions within Expr Add caching to recursive simplify_once calls Feb 5, 2024
@rjzamora rjzamora marked this pull request as ready for review February 5, 2024 18:00
@rjzamora
Copy link
Member Author

rjzamora commented Feb 5, 2024

I know this PR wasn't the best solution in its original form. However, I think it should be a reasonable solution for #796 in its revised form. The optimization time becomes negligible (<0.5s) for Q1 on my machine (a ~10x improvement).

dask_expr/_core.py Outdated Show resolved Hide resolved
dask_expr/io/_delayed.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@phofl phofl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I intended to do the same, this makes sense performance wise

My other PR is still worthwhile btw, would appreciate if you could take a look there #842

@rjzamora
Copy link
Member Author

rjzamora commented Feb 6, 2024

Thanks for the review @phofl !

My other PR is still worthwhile btw, would appreciate if you could take a look there #842

Yes, I agree that's worthwhile as well. I'll definitely take a look.

@phofl phofl merged commit b5d17ad into dask:main Feb 6, 2024
7 checks passed
@phofl
Copy link
Collaborator

phofl commented Feb 6, 2024

thx

@rjzamora rjzamora deleted the improve-simplify-caching branch February 6, 2024 14:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Optimization is now much slower in TPCh benchmarks
4 participants