-
Notifications
You must be signed in to change notification settings - Fork 1.5k
feat: change Expr OuterReferenceColumn to Box type for reducing expr struct size #16771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @zhuqi-lucas -- this looks quite nice
I think it is an API change so I will mark the PR as such and I think it is a good improvement
However, given it is an API change I think we should
- Wait until DataFusion 50.0.0 (let's not make any more changes in DataFusion 49 that we are starting to prepare for)
- Update the upgrading guide in https://datafusion.apache.org/library-user-guide/upgrading.html to explain to people hw to change things
@@ -4009,12 +4009,12 @@ logical_plan | |||
09)------------Unnest: lists[__unnest_placeholder(generate_series(Int64(1),outer_ref(t1.t1_int)))|depth=1] structs[] | |||
10)--------------Projection: generate_series(Int64(1), CAST(outer_ref(t1.t1_int) AS Int64)) AS __unnest_placeholder(generate_series(Int64(1),outer_ref(t1.t1_int))) | |||
11)----------------EmptyRelation | |||
physical_plan_error This feature is not implemented: Physical plan does not support logical expression OuterReferenceColumn(UInt32, Column { relation: Some(Bare { table: "t1" }), name: "t1_int" }) | |||
physical_plan_error This feature is not implemented: Physical plan does not support logical expression OuterReferenceColumn((UInt32, Column { relation: Some(Bare { table: "t1" }), name: "t1_int" })) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you know why the plan changes? I don't think the extra ()
really adds much -- can we restore the original?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it to OuterReference in latest PR, so it will use OuterReference instead of this change now, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Next step, we may can add debug/fmt for OuterReference to restore the original behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Restore all slt in latest PR, no diff from main branch.
@@ -3818,7 +3816,7 @@ mod test { | |||
// If this test fails when you change `Expr`, please try | |||
// `Box`ing the fields to make `Expr` smaller | |||
// See https://github.com/apache/datafusion/issues/16199 for details | |||
assert_eq!(size_of::<Expr>(), 128); | |||
assert_eq!(size_of::<Expr>(), 112); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change saves 16 bytes per Expr. Nice.
I will run some planning benchmarks and see if we can see any difference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @alamb , i am doing more amazing experiment, try to reduce from 128 to 80, so we can save 48 bytes per Expr!
datafusion/expr/src/expr.rs
Outdated
@@ -279,7 +279,7 @@ use sqlparser::ast::{ | |||
#[derive(Clone, PartialEq, PartialOrd, Eq, Debug, Hash)] | |||
pub enum Expr { | |||
/// An expression with a specific name. | |||
Alias(Alias), | |||
Alias(Box<Alias>), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another change that might be a bit less impactful would be to box the fields of Alias instead
So Alias {
expr: Box
..
}
It may be just as bad / worse though
datafusion/expr/src/expr.rs
Outdated
@@ -362,7 +362,7 @@ pub enum Expr { | |||
Placeholder(Placeholder), | |||
/// A placeholder which holds a reference to a qualified field | |||
/// in the outer query, used for correlated sub queries. | |||
OuterReferenceColumn(DataType, Column), | |||
OuterReferenceColumn(Box<(DataType, Column)>), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are going to make this change anyways, can we also pull this into a named struct rather than a unnamed tuple
like
struct OuterReference {
// fields here
}
enum Expr {
...
OuterReferenceColumn(OuterReference),
...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @alamb for this good suggestion, addressed in latest PR.
🤖 |
🤖: Benchmark completed Details
|
Thank you @alamb for review and i agree that we can do this for Datafusion 50.0.0, and since we have enough time, i want to do more improvement, try to reduce Expr more, from size 128 to 80, i am in progress now. |
Updated: Successfully changed the size from 128 to 80 in latest PR. |
🤖 |
🤖: Benchmark completed Details
|
🤖 |
🤖: Benchmark completed Details
|
Thank you @alamb for benchmark, interesting, it seems the performance decrease from 112 size to 80, i will investigate it, if i can't find the root cause, i will revert to 112 bytes first! |
Strange, i can't reproduce this in my local: with_param_values_many_columns 1.00 129.8±0.74µs ? ?/sec 1.06 137.2±0.79µs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Just some nit for your review.
Expr::Alias(alias_box) => { | ||
// alias_box: Box<Alias> | ||
let Alias { expr, name, .. } = *alias_box; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit - use boxed_alias instead, for consistency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @kosiew , addressed in latest PR.
if let Expr::Alias(alias_box) = expr { | ||
// alias_box: &Box<Alias>, so alias_box.as_ref() is &Alias | ||
let alias: &Alias = alias_box.as_ref(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above - boxed_alias
datafusion/expr/src/expr_schema.rs
Outdated
Expr::Alias(alias_box) => { | ||
let Alias { expr, .. } = alias_box.as_ref(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
boxed_alias
datafusion/expr/src/utils.rs
Outdated
@@ -992,7 +996,7 @@ pub fn iter_conjunction_owned(expr: Expr) -> impl Iterator<Item = Expr> { | |||
stack.push(*right); | |||
stack.push(*left); | |||
} | |||
Expr::Alias(Alias { expr, .. }) => stack.push(*expr), | |||
Expr::Alias(alias_box) => stack.push(*alias_box.expr), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
boxed_alias
Thank you @kosiew for review and suggestion, addressed in latest PR. |
After opening the DF50.0.0 release issue, you can add it to the list |
I will rerun |
🤖 |
🤖: Benchmark completed Details
|
Thank you @alamb , almost all improvement for most cases, but still the last one has %4 regression, and i can't reproduce in my local Mac. |
Thank you @xudong963 , added it in #16799 (comment) |
My guess is that some of the new slowdown / less predictability is due to many more |
Thank you @alamb , this is a very good point, let me try this and to see the benchmark result! |
42af92a
to
998950a
Compare
@alamb Updated in latest PR, only small change for OuterReferenceColumn now. |
🤖 |
🤖: Benchmark completed Details
|
The result is bad, no performance gain from only OuterReferenceColumn changing to Box, so i guess the improvement coming from Alias changing to Box? 🤔 |
Which issue does this PR close?
Continue to reduce the Expr struct size.
Rationale for this change
Continue to reduce the Expr struct size.
What changes are included in this PR?
Continue to reduce the Expr struct size.
Are these changes tested?
Yes
The size reduce from 128 to 112
Updated:
Continue reduce to 80 now!
Are there any user-facing changes?
No