Make `simplify` and `lower` optional within `Expr.optimize` #326

rjzamora · 2023-10-10T18:19:27Z

I have found that it can sometimes be useful to avoid lowering within an Expr.optimize call, because it becomes a bit easier to inspect the behavior of simplify on an expression graph.

…nd-lower

phofl · 2023-10-10T18:23:10Z

dask_expr/_collection.py

        return DaskMethodsMixin.persist(out, **kwargs)

-    def compute(self, fuse=True, combine_similar=True, **kwargs):
-        out = self.optimize(combine_similar=combine_similar, fuse=fuse)
+    def compute(self, simplify=True, fuse=True, combine_similar=True, **kwargs):


I don’t want to add unnecessary keywords here and this will work differently after #294, so I wouldn’t add control over simplify here

Yeah, the motivation isn't really to control options to compute, so I'm happy to roll that back. Thanks for pointing out 294.

phofl · 2023-10-10T18:25:33Z

Is there an advantage when running computations as well? This makes the logic noticeable more complex for a relatively small benefit.

We don't need everything configurable, I think we agree that there is really no point in not running simplify?

rjzamora · 2023-10-10T18:51:09Z

Is there an advantage when running computations as well? This makes the logic noticeable more complex for a relatively small benefit.

I included that change because I was using it for other experiments and realized we would probably be interested in making it possible to measure the effects of column-projections etc in a more direct way. However, compute is definitely not my primary interest here, so I'll roll that part back.

We don't need everything configurable, I think we agree that there is really no point in not running simplify?

Well, we must run simplify to guarantee that a low-level graph can be generated. We don't really need to simplify, but it's fine with me if we keep that component simple.

phofl · 2023-10-10T19:26:25Z

I am a bit uncomfortable with making def optimise more complicated.

Couldn't we use .simplify().combine_similar() instead of adjusting optimise?

rjzamora · 2023-10-10T20:25:47Z

I am a bit uncomfortable with making def optimise more complicated.

Couldn't we use .simplify().combine_similar() instead of adjusting optimise?

Sure, that seems fair to me. This PR is a much lower priority than #321, so I have no problem closing it.

I'll admit that I was not being completely transparent about my full motivations in my PR description. I was hoping to slowly nudge the library in the direction of establishing a clearer distinction between "abstract" and "dask-specific" expressions.

I was thinking that this would be simpler if lower was optional, but the real blocker is probably the fact that Expr is currently required to support dask-specific attributes like divisions and npartitions. I am starting to feel that this should not be the case at the level of the base Expr class. However, this is obviously a big change (and is somewhat orthogonal to the original goals of this work), so I'll just plan to revisit these thoughts later :)

rjzamora added 4 commits October 4, 2023 07:23

make it optional to simplify and lower in optimize

de98a8d

Merge remote-tracking branch 'upstream/main' into optional-simplify-a…

eae72c7

…nd-lower

add test

9f494a2

add explicit lower=True

c5b6967

phofl reviewed Oct 10, 2023

View reviewed changes

roll back compute/persist change

46701fc

rjzamora closed this Oct 10, 2023

rjzamora mentioned this pull request Oct 11, 2023

Add IO fusion if we can reduce number of partitions #327

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make `simplify` and `lower` optional within `Expr.optimize` #326

Make `simplify` and `lower` optional within `Expr.optimize` #326

rjzamora commented Oct 10, 2023

phofl Oct 10, 2023

rjzamora Oct 10, 2023

phofl commented Oct 10, 2023

rjzamora commented Oct 10, 2023 •

edited

Loading

phofl commented Oct 10, 2023

rjzamora commented Oct 10, 2023

Make simplify and lower optional within Expr.optimize #326

Make simplify and lower optional within Expr.optimize #326

Conversation

rjzamora commented Oct 10, 2023

phofl Oct 10, 2023

Choose a reason for hiding this comment

rjzamora Oct 10, 2023

Choose a reason for hiding this comment

phofl commented Oct 10, 2023

rjzamora commented Oct 10, 2023 • edited Loading

phofl commented Oct 10, 2023

rjzamora commented Oct 10, 2023

Make `simplify` and `lower` optional within `Expr.optimize` #326

Make `simplify` and `lower` optional within `Expr.optimize` #326

rjzamora commented Oct 10, 2023 •

edited

Loading