-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid copying (so much) for LogicalPlan::map_children #9946
Conversation
LogicalPlan::map_children
dcdbe88
to
755342f
Compare
LogicalPlan::map_children
LogicalPlan::map_children
755342f
to
b4a9ffd
Compare
Ok, I think once #9913 from @peter-toth is merged this PR will be ready to review |
// specific language governing permissions and limitations | ||
// under the License. | ||
|
||
//! Methods for rewriting logical plans |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@peter-toth if you have time I would love to hear your thoughts on this change / API (no changes to TreeNode)
Sorry @alamb , I'm still working on my #9913. I realized that there are a few more issues I need to fix and test. Will try to finish it tomorrow or during the weekend and ping you. |
10448fb
to
e570e89
Compare
…nt-logicalplan-subquery-handling # Conflicts: # datafusion/optimizer/src/analyzer/mod.rs
…n-subquery-handling
e570e89
to
12d4a8c
Compare
/// `LogicalPlan`s, for example such as are in [`Expr::Exists`]. | ||
/// | ||
/// [`Expr::Exists`]: crate::expr::Expr::Exists | ||
pub(crate) fn rewrite_children<F>(&mut self, mut f: F) -> Result<Transformed<()>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@peter-toth if you have a moment, I would love to hear any thoughts you have on this API (it is an in-place update to LogicalPlan
but no change to TreeNode
Questions:
- Do you think we need a similar one for
rewrite_children_with_subqueries
🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I don't think we need anything else. Your change to map_children()
will affect all LogicalPlan::..._with_subqueries()
APIs too. The trick in rewrite_arc()
looks very good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, just one thing came into my mind: as far as I see in rewrite_arc()
you need an owned Arc<LogicalPlan>
to call Arc::try_unwrap
. But, you only have &mut Arc<LogicalPlan>
and that's why you need std::mem::swap
2 times with that PLACEHOLDER
.
If your rewrite_children()
/map_children()
worked with owned Arc<LogicalPlan>
and not with &mut Arc<LogicalPlan>
the swap wouldn't be needed.
But in that case the implementation would be a bit complex like:
fn map_children<F: FnMut(Self) -> Result<Transformed<Self>>>(
self,
f: F,
) -> Result<Transformed<Self>> {
Ok(match self {
LogicalPlan::Projection(Projection { expr, input, schema }) => {
rewrite_arc(input, f)?.update_data(|input| LogicalPlan::Projection(Projection { expr, input, schema }))
}
LogicalPlan::Filter(Filter { predicate, input }) => {
rewrite_arc(input, f)?.update_data(|input| LogicalPlan::Filter(Filter { predicate, input }))
}
...
})
}
Also discard_data()
won't be required. BTW, this is how Expr::map_children()
is implemented but there are Box
es so the transform_box()
implementation is simpler than this rewrite_arc()
is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But, I'm not sure about the cost of those 2 swaps, so it might not give any noticable improvement...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an excellent idea. I will do it
LogicalPlan::map_children
LogicalPlan::map_children
LogicalPlan::map_children
LogicalPlan::clone()
in LogicalPlan::map_children
LogicalPlan::clone()
in LogicalPlan::map_children
LogicalPlan::clone()
in LogicalPlan::map_children
when possible
This PR is getting quite messy with history. I will make a new one Update: #9999 |
LogicalPlan::clone()
in LogicalPlan::map_children
when possible
Waiting on sorting out subquery handling in #9913
Which issue does this PR close?
Part of #9637 (based on ideas from #9708 and #9768). 🙏 @jayzhan211
Rationale for this change
I am trying to make planning faster by not copying as much (I also want to reduce the number of allocations our planning does as we think it may be related to a concurrency bottleneck we are seeing downstream in IOx)
What changes are included in this PR?
Change
LogicalPlan::map_children
to rewrite the children in place without copying them. This uses a trick(hack?) to rewriteArc<LogicalPlan>
in place when possibleThis Implement suggestion by @peter-toth #9780 (comment) on #9780 and make the existing tree node API faster
Are these changes tested?
Functionally, this is covered by existing CI.
Performance tests: (slightly) faster than from main (but sets the stage for #9948 which goes much faster)
Details
Are there any user-facing changes?
TLDR is No.
This change alone doesn't change performance (largely because the TreeNodeRewriter isn't used in the optimizer passes yet). However, when combined with #9948 it makes planning 10% faster (and sets the stage for even more improvements)