Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid copying (so much) for LogicalPlan::map_children #9946

Closed
wants to merge 11 commits into from

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Apr 4, 2024

Waiting on sorting out subquery handling in #9913

Which issue does this PR close?

Part of #9637 (based on ideas from #9708 and #9768). 🙏 @jayzhan211

Rationale for this change

I am trying to make planning faster by not copying as much (I also want to reduce the number of allocations our planning does as we think it may be related to a concurrency bottleneck we are seeing downstream in IOx)

What changes are included in this PR?

Change LogicalPlan::map_children to rewrite the children in place without copying them. This uses a trick(hack?) to rewrite Arc<LogicalPlan> in place when possible

This Implement suggestion by @peter-toth #9780 (comment) on #9780 and make the existing tree node API faster

Are these changes tested?

Functionally, this is covered by existing CI.

Performance tests: (slightly) faster than from main (but sets the stage for #9948 which goes much faster)

Details

group                                         main                                   optimizer_tree_node2
-----                                         ----                                   --------------------
logical_aggregate_with_join                   1.00  1220.2±14.00µs        ? ?/sec    1.00  1224.6±19.28µs        ? ?/sec
logical_plan_tpcds_all                        1.00    160.2±1.27ms        ? ?/sec    1.01    161.1±1.49ms        ? ?/sec
logical_plan_tpch_all                         1.03     17.5±0.16ms        ? ?/sec    1.00     17.0±0.25ms        ? ?/sec
logical_select_all_from_1000                  1.02     19.7±0.11ms        ? ?/sec    1.00     19.3±0.17ms        ? ?/sec
logical_select_one_from_700                   1.00   798.7±35.30µs        ? ?/sec    1.00   800.0±12.76µs        ? ?/sec
logical_trivial_join_high_numbered_columns    1.00   742.0±25.24µs        ? ?/sec    1.01   747.3±16.51µs        ? ?/sec
logical_trivial_join_low_numbered_columns     1.00   728.0±11.02µs        ? ?/sec    1.00   730.2±10.20µs        ? ?/sec
physical_plan_tpcds_all                       1.01  1899.1±14.88ms        ? ?/sec    1.00  1878.6±13.10ms        ? ?/sec
physical_plan_tpch_all                        1.04    124.3±1.76ms        ? ?/sec    1.00    118.9±1.24ms        ? ?/sec
physical_plan_tpch_q1                         1.03      7.5±0.07ms        ? ?/sec    1.00      7.3±0.05ms        ? ?/sec
physical_plan_tpch_q10                        1.00      5.7±0.05ms        ? ?/sec    1.00      5.6±0.07ms        ? ?/sec
physical_plan_tpch_q11                        1.01      5.0±0.08ms        ? ?/sec    1.00      4.9±0.06ms        ? ?/sec
physical_plan_tpch_q12                        1.01      4.0±0.03ms        ? ?/sec    1.00      3.9±0.07ms        ? ?/sec
physical_plan_tpch_q13                        1.02      2.7±0.03ms        ? ?/sec    1.00      2.6±0.02ms        ? ?/sec
physical_plan_tpch_q14                        1.02      3.4±0.04ms        ? ?/sec    1.00      3.4±0.02ms        ? ?/sec
physical_plan_tpch_q16                        1.01      5.0±0.05ms        ? ?/sec    1.00      4.9±0.04ms        ? ?/sec
physical_plan_tpch_q17                        1.01      4.7±0.05ms        ? ?/sec    1.00      4.7±0.04ms        ? ?/sec
physical_plan_tpch_q18                        1.01      5.1±0.08ms        ? ?/sec    1.00      5.1±0.10ms        ? ?/sec
physical_plan_tpch_q19                        1.01      9.7±0.07ms        ? ?/sec    1.00      9.5±0.08ms        ? ?/sec
physical_plan_tpch_q2                         1.01     10.7±0.09ms        ? ?/sec    1.00     10.6±0.10ms        ? ?/sec
physical_plan_tpch_q20                        1.03      6.2±0.04ms        ? ?/sec    1.00      6.1±0.06ms        ? ?/sec
physical_plan_tpch_q21                        1.02      8.5±0.11ms        ? ?/sec    1.00      8.4±0.11ms        ? ?/sec
physical_plan_tpch_q22                        1.03      4.6±0.07ms        ? ?/sec    1.00      4.4±0.04ms        ? ?/sec
physical_plan_tpch_q3                         1.00      4.0±0.04ms        ? ?/sec    1.00      3.9±0.08ms        ? ?/sec
physical_plan_tpch_q4                         1.03      3.0±0.04ms        ? ?/sec    1.00      2.9±0.03ms        ? ?/sec
physical_plan_tpch_q5                         1.01      5.8±0.04ms        ? ?/sec    1.00      5.7±0.06ms        ? ?/sec
physical_plan_tpch_q6                         1.03      2.1±0.02ms        ? ?/sec    1.00      2.0±0.03ms        ? ?/sec
physical_plan_tpch_q7                         1.00      7.7±0.05ms        ? ?/sec    1.02      7.8±0.10ms        ? ?/sec
physical_plan_tpch_q8                         1.00      9.8±0.13ms        ? ?/sec    1.01      9.9±0.15ms        ? ?/sec
physical_plan_tpch_q9                         1.00      7.4±0.07ms        ? ?/sec    1.00      7.4±0.07ms        ? ?/sec
physical_select_all_from_1000                 1.01    129.6±0.45ms        ? ?/sec    1.00    127.7±0.44ms        ? ?/sec
physical_select_one_from_700                  1.00      4.1±0.03ms        ? ?/sec    1.01      4.1±0.04ms        ? ?/sec

Are there any user-facing changes?

TLDR is No.

This change alone doesn't change performance (largely because the TreeNodeRewriter isn't used in the optimizer passes yet). However, when combined with #9948 it makes planning 10% faster (and sets the stage for even more improvements)

@github-actions github-actions bot added the logical-expr Logical plan and expressions label Apr 4, 2024
@alamb alamb changed the title Optimzation: avoid copying (so much) for LogicalPlan::map_children Optimzation: avoid copying (so much) for LogicalPlan::map_children Apr 4, 2024
@alamb alamb force-pushed the alamb/map_in_place branch from dcdbe88 to 755342f Compare April 4, 2024 13:18
@alamb alamb changed the title Optimzation: avoid copying (so much) for LogicalPlan::map_children Avoid copying (so much) for LogicalPlan::map_children Apr 4, 2024
@alamb alamb force-pushed the alamb/map_in_place branch from 755342f to b4a9ffd Compare April 4, 2024 19:22
@alamb
Copy link
Contributor Author

alamb commented Apr 4, 2024

Ok, I think once #9913 from @peter-toth is merged this PR will be ready to review

// specific language governing permissions and limitations
// under the License.

//! Methods for rewriting logical plans
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peter-toth if you have time I would love to hear your thoughts on this change / API (no changes to TreeNode)

@peter-toth
Copy link
Contributor

Ok, I think once #9913 from @peter-toth is merged this PR will be ready to review

Sorry @alamb , I'm still working on my #9913. I realized that there are a few more issues I need to fix and test. Will try to finish it tomorrow or during the weekend and ping you.

@alamb
Copy link
Contributor Author

alamb commented Apr 4, 2024

Sorry @alamb , I'm still working on my #9913. I realized that there are a few more issues I need to fix and test. Will try to finish it tomorrow or during the weekend and ping you.

No worries! I am not blocked and have plenty of other things to entertain me at the moment

@alamb alamb force-pushed the alamb/map_in_place branch from 10448fb to e570e89 Compare April 6, 2024 10:37
@alamb alamb force-pushed the alamb/map_in_place branch from e570e89 to 12d4a8c Compare April 7, 2024 17:49
@github-actions github-actions bot added optimizer Optimizer rules core Core DataFusion crate labels Apr 7, 2024
/// `LogicalPlan`s, for example such as are in [`Expr::Exists`].
///
/// [`Expr::Exists`]: crate::expr::Expr::Exists
pub(crate) fn rewrite_children<F>(&mut self, mut f: F) -> Result<Transformed<()>>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peter-toth if you have a moment, I would love to hear any thoughts you have on this API (it is an in-place update to LogicalPlan but no change to TreeNode

Questions:

  1. Do you think we need a similar one for rewrite_children_with_subqueries 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I don't think we need anything else. Your change to map_children() will affect all LogicalPlan::..._with_subqueries() APIs too. The trick in rewrite_arc() looks very good to me.

Copy link
Contributor

@peter-toth peter-toth Apr 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, just one thing came into my mind: as far as I see in rewrite_arc() you need an owned Arc<LogicalPlan> to call Arc::try_unwrap. But, you only have &mut Arc<LogicalPlan> and that's why you need std::mem::swap 2 times with that PLACEHOLDER.
If your rewrite_children()/map_children() worked with owned Arc<LogicalPlan> and not with &mut Arc<LogicalPlan> the swap wouldn't be needed.
But in that case the implementation would be a bit complex like:

fn map_children<F: FnMut(Self) -> Result<Transformed<Self>>>(
    self,
    f: F,
) -> Result<Transformed<Self>> {
    Ok(match self {
        LogicalPlan::Projection(Projection { expr, input, schema }) => {
            rewrite_arc(input, f)?.update_data(|input| LogicalPlan::Projection(Projection { expr, input, schema }))
        }
        LogicalPlan::Filter(Filter { predicate, input }) => {
            rewrite_arc(input, f)?.update_data(|input| LogicalPlan::Filter(Filter { predicate, input }))
        }
        ...
    })
}

Also discard_data() won't be required. BTW, this is how Expr::map_children() is implemented but there are Boxes so the transform_box() implementation is simpler than this rewrite_arc() is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But, I'm not sure about the cost of those 2 swaps, so it might not give any noticable improvement...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an excellent idea. I will do it

@alamb alamb changed the title Avoid copying (so much) for LogicalPlan::map_children Avoid copying (as much) for LogicalPlan::map_children Apr 8, 2024
@alamb alamb changed the title Avoid copying (as much) for LogicalPlan::map_children Refactor: Avoid LogicalPlan::clone() in LogicalPlan::map_children Apr 8, 2024
@alamb alamb changed the title Refactor: Avoid LogicalPlan::clone() in LogicalPlan::map_children Refactor: Avoid LogicalPlan::clone() in LogicalPlan::map_children when possible Apr 8, 2024
@alamb
Copy link
Contributor Author

alamb commented Apr 8, 2024

This PR is getting quite messy with history. I will make a new one

Update: #9999

@alamb alamb closed this Apr 8, 2024
@alamb alamb changed the title Refactor: Avoid LogicalPlan::clone() in LogicalPlan::map_children when possible Avoid copying (so much) for LogicalPlan::map_children Apr 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants