Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make simplify and lower optional within Expr.optimize #326

Closed
wants to merge 5 commits into from

Conversation

rjzamora
Copy link
Member

I have found that it can sometimes be useful to avoid lowering within an Expr.optimize call, because it becomes a bit easier to inspect the behavior of simplify on an expression graph.

return DaskMethodsMixin.persist(out, **kwargs)

def compute(self, fuse=True, combine_similar=True, **kwargs):
out = self.optimize(combine_similar=combine_similar, fuse=fuse)
def compute(self, simplify=True, fuse=True, combine_similar=True, **kwargs):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t want to add unnecessary keywords here and this will work differently after #294, so I wouldn’t add control over simplify here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the motivation isn't really to control options to compute, so I'm happy to roll that back. Thanks for pointing out 294.

@phofl
Copy link
Collaborator

phofl commented Oct 10, 2023

Is there an advantage when running computations as well? This makes the logic noticeable more complex for a relatively small benefit.

We don't need everything configurable, I think we agree that there is really no point in not running simplify?

@rjzamora
Copy link
Member Author

rjzamora commented Oct 10, 2023

Is there an advantage when running computations as well? This makes the logic noticeable more complex for a relatively small benefit.

I included that change because I was using it for other experiments and realized we would probably be interested in making it possible to measure the effects of column-projections etc in a more direct way. However, compute is definitely not my primary interest here, so I'll roll that part back.

We don't need everything configurable, I think we agree that there is really no point in not running simplify?

Well, we must run simplify to guarantee that a low-level graph can be generated. We don't really need to simplify, but it's fine with me if we keep that component simple.

@phofl
Copy link
Collaborator

phofl commented Oct 10, 2023

I am a bit uncomfortable with making def optimise more complicated.

Couldn't we use .simplify().combine_similar() instead of adjusting optimise?

@rjzamora
Copy link
Member Author

I am a bit uncomfortable with making def optimise more complicated.

Couldn't we use .simplify().combine_similar() instead of adjusting optimise?

Sure, that seems fair to me. This PR is a much lower priority than #321, so I have no problem closing it.

I'll admit that I was not being completely transparent about my full motivations in my PR description. I was hoping to slowly nudge the library in the direction of establishing a clearer distinction between "abstract" and "dask-specific" expressions.

I was thinking that this would be simpler if lower was optional, but the real blocker is probably the fact that Expr is currently required to support dask-specific attributes like divisions and npartitions. I am starting to feel that this should not be the case at the level of the base Expr class. However, this is obviously a big change (and is somewhat orthogonal to the original goals of this work), so I'll just plan to revisit these thoughts later :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants