Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Optimizer to use owned plans and TreeNode API (10% faster planning) #9948

Merged
merged 4 commits into from
Apr 10, 2024

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Apr 4, 2024

Note: this looks like a large PR, but many of the changes are lines that removed &

Which issue does this PR close?

Part of #9637 (stop copying LogicalPlans in the Optimizer) and #8913 (unified TreeNode rewrite)

Rationale for this change

The current structure of the Optimizer copies LogicalPlans a large number of times. This is both slow as well as requires a large number of allocations

After #9999, the TreeNode API can handle rewriting LogicalPlan efficiently without clone.

Thus it makes sense to use the TreeNode API in the optimizer, both because I think the code is simpler as well as to take advantage of the performance improvements in TreeNode API.

What changes are included in this PR?

  1. Refactor Optimizer to use TreeNode API
  2. Change Optimizer::optimize to take an owned LogicalPlan rather than force a copy

Are these changes tested?

By existing CI

Performance benchmarks: Planning is 10% faster for TPCH, 13% faster for TPCDS

Details

++ critcmp main optimizer_tree_node2
group                                         main                                   optimizer_tree_node2
-----                                         ----                                   --------------------
logical_aggregate_with_join                   1.00  1177.8±21.23µs        ? ?/sec    1.00  1176.3±12.15µs        ? ?/sec
logical_plan_tpcds_all                        1.01    154.2±0.85ms        ? ?/sec    1.00    153.1±0.77ms        ? ?/sec
logical_plan_tpch_all                         1.01     16.5±0.14ms        ? ?/sec    1.00     16.4±0.16ms        ? ?/sec
logical_select_all_from_1000                  1.06     19.3±0.15ms        ? ?/sec    1.00     18.2±0.18ms        ? ?/sec
logical_select_one_from_700                   1.00   779.6±21.54µs        ? ?/sec    1.01   785.8±15.37µs        ? ?/sec
logical_trivial_join_high_numbered_columns    1.00   726.7±12.27µs        ? ?/sec    1.00   728.4±12.68µs        ? ?/sec
logical_trivial_join_low_numbered_columns     1.00    710.9±6.84µs        ? ?/sec    1.00    714.1±6.52µs        ? ?/sec
physical_plan_tpcds_all                       1.13   1834.3±2.48ms        ? ?/sec    1.00   1624.6±2.37ms        ? ?/sec
physical_plan_tpch_all                        1.10    119.2±0.60ms        ? ?/sec    1.00    107.9±0.56ms        ? ?/sec
physical_plan_tpch_q1                         1.19      7.3±0.06ms        ? ?/sec    1.00      6.1±0.04ms        ? ?/sec
physical_plan_tpch_q10                        1.11      5.5±0.05ms        ? ?/sec    1.00      5.0±0.03ms        ? ?/sec
physical_plan_tpch_q11                        1.10      4.8±0.02ms        ? ?/sec    1.00      4.4±0.02ms        ? ?/sec
physical_plan_tpch_q12                        1.09      3.9±0.02ms        ? ?/sec    1.00      3.5±0.02ms        ? ?/sec
physical_plan_tpch_q13                        1.09      2.6±0.02ms        ? ?/sec    1.00      2.4±0.01ms        ? ?/sec
physical_plan_tpch_q14                        1.09      3.3±0.03ms        ? ?/sec    1.00      3.1±0.02ms        ? ?/sec
physical_plan_tpch_q16                        1.11      4.9±0.05ms        ? ?/sec    1.00      4.4±0.02ms        ? ?/sec
physical_plan_tpch_q17                        1.11      4.6±0.03ms        ? ?/sec    1.00      4.2±0.03ms        ? ?/sec
physical_plan_tpch_q18                        1.10      5.0±0.03ms        ? ?/sec    1.00      4.5±0.02ms        ? ?/sec
physical_plan_tpch_q19                        1.06      9.4±0.07ms        ? ?/sec    1.00      8.8±0.04ms        ? ?/sec
physical_plan_tpch_q2                         1.11     10.5±0.04ms        ? ?/sec    1.00      9.4±0.03ms        ? ?/sec
physical_plan_tpch_q20                        1.12      6.1±0.05ms        ? ?/sec    1.00      5.4±0.03ms        ? ?/sec
physical_plan_tpch_q21                        1.12      8.3±0.07ms        ? ?/sec    1.00      7.4±0.03ms        ? ?/sec
physical_plan_tpch_q22                        1.12      4.4±0.02ms        ? ?/sec    1.00      3.9±0.06ms        ? ?/sec
physical_plan_tpch_q3                         1.09      3.9±0.02ms        ? ?/sec    1.00      3.5±0.02ms        ? ?/sec
physical_plan_tpch_q4                         1.12      2.9±0.01ms        ? ?/sec    1.00      2.6±0.01ms        ? ?/sec
physical_plan_tpch_q5                         1.08      5.6±0.02ms        ? ?/sec    1.00      5.2±0.05ms        ? ?/sec
physical_plan_tpch_q6                         1.07  1981.4±12.85µs        ? ?/sec    1.00  1858.7±85.62µs        ? ?/sec
physical_plan_tpch_q7                         1.10      7.5±0.05ms        ? ?/sec    1.00      6.8±0.05ms        ? ?/sec
physical_plan_tpch_q8                         1.10      9.5±0.08ms        ? ?/sec    1.00      8.7±0.04ms        ? ?/sec
physical_plan_tpch_q9                         1.10      7.2±0.04ms        ? ?/sec    1.00      6.5±0.04ms        ? ?/sec
physical_select_all_from_1000                 1.19    128.1±0.29ms        ? ?/sec    1.00    107.4±0.41ms        ? ?/sec
physical_select_one_from_700                  1.02      4.0±0.03ms        ? ?/sec    1.00      3.9±0.05ms        ? ?/sec

Are there any user-facing changes?

There is a small API change: Optimizer::optimize now takes an owned LogicalPlan rather a reference (which forces a copy)

Planned follow on task

  • Add special cases / rewrite other optimizer passes to reduce copies

@alamb alamb added the api change Changes the API exposed to users of the crate label Apr 4, 2024
@github-actions github-actions bot added optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Apr 4, 2024
@alamb alamb force-pushed the alamb/optimizer_tree_node2 branch from 613fdae to b950a9e Compare April 4, 2024 13:09
@alamb alamb changed the title Rewrite Optimizer to use TreeNode API Refactor Optimizer to use TreeNode API Apr 4, 2024
@alamb alamb force-pushed the alamb/optimizer_tree_node2 branch from b950a9e to ef189fb Compare April 4, 2024 13:34
@@ -59,7 +59,7 @@ pub fn main() -> Result<()> {

// then run the optimizer with our custom rule
let optimizer = Optimizer::with_rules(vec![Arc::new(MyOptimizerRule {})]);
let optimized_plan = optimizer.optimize(&analyzed_plan, &config, observe)?;
let optimized_plan = optimizer.optimize(analyzed_plan, &config, observe)?;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This illustrates the API change -- the optimizer now takes an owned plan rather than a reference

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A great progress!

@@ -110,7 +110,7 @@ fn test_sql(sql: &str) -> Result<LogicalPlan> {
let optimizer = Optimizer::new();
// analyze and optimize the logical plan
let plan = analyzer.execute_and_check(&plan, config.options(), |_, _| {})?;
optimizer.optimize(&plan, &config, |_, _| {})
optimizer.optimize(plan, &config, |_, _| {})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A large amount of this PR is changes to test to pass in an owned plan


let formatted_plan = format!("{optimized_plan:?}");
assert_eq!(formatted_plan, expected);
assert_eq!(plan.schema(), optimized_plan.schema());
Copy link
Contributor Author

@alamb alamb Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the tests to call Optimizer::optimize directly, which already checks the schema doesn't change, so this test is redundant

This applies to several other changes in this PR

///
/// Notice: **sometime** result after optimize still can be optimized, we need apply again.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think this comment is applicable anymore -- the optimizer handles the recursion internally as well as applying multiple optimizer passes

datafusion/optimizer/src/optimizer.rs Outdated Show resolved Hide resolved
datafusion/optimizer/src/optimizer.rs Outdated Show resolved Hide resolved
@@ -356,97 +423,22 @@ impl Optimizer {
debug!("Optimizer took {} ms", start_time.elapsed().as_millis());
Ok(new_plan)
}

fn optimize_node(
Copy link
Contributor Author

@alamb alamb Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code implemented plan recursion within the optimizer and is (now) redundant with the TreeNode API

Field { name: \"c\", data_type: UInt32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, \
field_qualifiers: [Some(Bare { table: \"test\" }), Some(Bare { table: \"test\" }), Some(Bare { table: \"test\" })], \
functional_dependencies: FunctionalDependencies { deps: [] } }, \
"Optimizer rule 'get table_scan rule' failed\n\
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original error actually is incorrect that it reports the reversed schemas (the "new schema" was actually the original schema)

@alamb alamb force-pushed the alamb/optimizer_tree_node2 branch from ef189fb to d951289 Compare April 4, 2024 19:25
@github-actions github-actions bot added the logical-expr Logical plan and expressions label Apr 4, 2024
@alamb alamb force-pushed the alamb/optimizer_tree_node2 branch from d951289 to a755338 Compare April 4, 2024 20:19
@alamb alamb changed the title Refactor Optimizer to use TreeNode API Refactor Optimizer to use TreeNode API (10% faster planning) Apr 5, 2024
@alamb alamb force-pushed the alamb/optimizer_tree_node2 branch 2 times, most recently from 52f3a54 to 2d5e154 Compare April 7, 2024 17:50
@alamb alamb force-pushed the alamb/optimizer_tree_node2 branch from 2d5e154 to a282d6a Compare April 9, 2024 10:00
@alamb alamb force-pushed the alamb/optimizer_tree_node2 branch from a282d6a to a4cb731 Compare April 9, 2024 12:31
@github-actions github-actions bot removed the logical-expr Logical plan and expressions label Apr 9, 2024
@alamb alamb changed the title Refactor Optimizer to use TreeNode API (10% faster planning) Refactor Optimizer to use owned plans and TreeNode API (10% faster planning) Apr 9, 2024
}

/// Recursively rewrites LogicalPlans
struct Rewriter<'a> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

datafusion/optimizer/src/optimizer.rs has all the important changes (to use the TreeNode API and stop copying)

&OptimizerContext::new(),
)?
.unwrap_or_else(|| plan.clone());
let optimized_plan =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the optimizer correctly applies rules recursively now, there is no need to explicitly call optimize recursively

)?
.unwrap_or_else(|| plan.clone());
// Apply the rule once
let opt_context = OptimizerContext::new().with_max_passes(1);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this some of the tests don't pass. By default the optimizer runs a few times until no changes are detected. Limiting to 1 pass mimics the previous test behavior

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this. It seems that the previous code does not limit the pass to 1. Why do we need to limit it now to have the same behavior as the previous one? 😕

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My explanation goes like: The loop that applies the rule more than once calls optimize_recursively each time

https://github.com/apache/arrow-datafusion/blob/75c399ce7d4d5360140c64089dd7b05ffd7c49ef/datafusion/optimizer/src/optimizer.rs#L298-L303

This test only called optimze_recursively once (directly) and thus the OptimizeRule is only applied once

When I rewrote the test to use Optimizer::optimize the loop will now kick in and so the OptimizeRule will be run several times unless we set with_max_passes

This same reasoning applies to the other tests, but apparently they get the same answer when applied more than once

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see!

@alamb alamb marked this pull request as ready for review April 9, 2024 12:45
@alamb alamb requested review from jackwener and mustafasrepo April 9, 2024 14:11
@alamb
Copy link
Contributor Author

alamb commented Apr 9, 2024

@jackwener since you implemented some of the original optimizer recursion I wonder if you would have some time to review this PR

Copy link
Contributor

@jayzhan211 jayzhan211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks pretty nice now!! 🚀


let result = match rule.apply_order() {
// optimizer handles recursion
Some(apply_order) => new_plan.rewrite(&mut Rewriter::new(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does rename it to rewrite_recurisvely more straightforward?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a generic api (so using rewrite makes sense to me), it's not introduced by this PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got a question here.
In previous code, optimize_inputs get the plan.inputs(). how does rewrite get the plan.inputs() here? How does childnode in rewrite equals to plan.inputs()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see the map_children

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@jackwener jackwener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nice job to me, thanks @alamb!

@alamb
Copy link
Contributor Author

alamb commented Apr 10, 2024

Thank you very much @jackwener and @jayzhan211 for the reviews 🙏

@alamb alamb merged commit 03d8ba1 into apache:main Apr 10, 2024
24 checks passed
@alamb alamb deleted the alamb/optimizer_tree_node2 branch April 10, 2024 13:27
@mustafasrepo
Copy link
Contributor

Thanks @alamb for this work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api change Changes the API exposed to users of the crate core Core DataFusion crate optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants