Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve constant values across union operations #13805

Merged
merged 26 commits into from
Dec 25, 2024

Conversation

gokselk
Copy link
Contributor

@gokselk gokselk commented Dec 17, 2024

Which issue does this PR close?

Closes #13804.

Rationale for this change

Currently, DataFusion doesn't preserve constant values across union operations even when both sides have the same constant value. This change enables better optimization by tracking and preserving constant values when they match.

What changes are included in this PR?

  • Added value: Option<ScalarValue> field to ConstExpr
  • Added methods to get/set constant values
  • Modified union operation logic to preserve matching constant values
  • Updated equality comparison for ConstExpr
  • Added tests for constant value preservation in unions

Are these changes tested?

Yes, added new test case test_union_constant_value_preservation that verifies constant value preservation across unions.

Are there any user-facing changes?

No user-facing changes. This is an internal optimization improvement.

@github-actions github-actions bot added the physical-expr Physical Expressions label Dec 17, 2024
@gokselk
Copy link
Contributor Author

gokselk commented Dec 17, 2024

cc: @berkaysynnada @ozankabak

@gokselk gokselk changed the title Feature/const expr value tracking Preserve constant values across union operations Dec 17, 2024
Copy link
Contributor

@berkaysynnada berkaysynnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have just one suggestion, otherwise LGTM

datafusion/physical-expr/src/equivalence/properties.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gokselk and @berkaysynnada

I suggest we try to write an end to end sqllogictest for this query too.

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Dec 17, 2024
@ozankabak
Copy link
Contributor

ozankabak commented Dec 18, 2024

I wonder if we should change across_partitions to an enum; i.e.

enum PartitionValues {
    Uniform(Option<ScalarValue>),
    Heterogenous(Vec<Option<ScalarValue>>)
}

with Uniform meaning that all partitions have the same value given in the payload (if known), and Heterogenous meaning partitions can have different constant values (each of which is given in the vector, if known).

@gokselk gokselk force-pushed the feature/const-expr-value-tracking branch from 1a81628 to f737c65 Compare December 19, 2024 07:38
@gokselk gokselk marked this pull request as draft December 20, 2024 05:56
@gokselk gokselk force-pushed the feature/const-expr-value-tracking branch from 6cc8259 to 1917c0e Compare December 23, 2024 05:12
@gokselk gokselk marked this pull request as ready for review December 23, 2024 11:18
05)----Filter: aggregate_test_100.c1 = Utf8("a")
06)------TableScan: aggregate_test_100 projection=[c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13], partial_filters=[aggregate_test_100.c1 = Utf8("a")]
physical_plan
01)CoalescePartitionsExec
Copy link
Contributor

@berkaysynnada berkaysynnada Dec 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears as SortPreservingMergeExec before these changes 👍🏻

@berkaysynnada
Copy link
Contributor

I'm planning to merge this PR once the CI is all green

@berkaysynnada berkaysynnada merged commit b9cef8c into apache:main Dec 25, 2024
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Preserve constant values in union operations
4 participants