Skip to content

Conversation

@paleolimbot
Copy link
Member

@paleolimbot paleolimbot commented Oct 17, 2025

Which issue does this PR close?

I am sorry that I missed the previous PR implementing this ( #18120 ) and I'm also happy to review that one instead of updating this!

Rationale for this change

Other systems that interact with the logical plan (e.g., SQL, Substrait) can express types that are not strictly within the arrow DataType enum.

What changes are included in this PR?

For the Cast and TryCast structs, the destination data type was changed from a DataType to a FieldRef.

Are these changes tested?

Yes.

Are there any user-facing changes?

Yes, any code using Cast { .. } to create an expression would need to use Cast::new() instead (or pass on field metadata if it has it). Existing matches will need to be upated for the data_type -> field member rename.

@github-actions github-actions bot added sql SQL Planner logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates core Core DataFusion crate substrait Changes to the substrait crate proto Related to proto crate functions Changes to functions implementation labels Oct 17, 2025
Comment on lines 598 to 603
f.as_ref()
.clone()
.with_data_type(data_type.data_type().clone())
.with_metadata(f.metadata().clone())
// TODO: should nullability be overridden here or derived from the
// input expression?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if this type of cast should be able to express nullability or not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently this PR does not consider the nullability of the cast field, because the destination physical expression won't consider it either (and thus the return fields would be out of sync).

Comment on lines 294 to 295
data_type.clone(),
// TODO: this drops extension metadata associated with the cast
data_type.data_type().clone(),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't actually need physical expressions to be able to cast things...my vauge plan is to use a logical plan transformation or perhaps optimizer rule to replace casts to extension types with a ScalarUDF call. This should possibly error if there is mismatched metadata between the input and destination (i.e., a physical cast would only ever represent a storage cast, which is usually OK).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated this to error at the logical expr -> physical expr stage if there is metadata on the cast field. I think this is better than dropping it (and won't break any existing code because before this PR such a cast could not exist).

@github-actions github-actions bot added the optimizer Optimizer rules label Oct 25, 2025
@paleolimbot paleolimbot marked this pull request as ready for review October 27, 2025 14:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate functions Changes to functions implementation logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Changes to the physical-expr crates proto Related to proto crate sql SQL Planner substrait Changes to the substrait crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LogicalPlan Casts can't express a cast to an extension type

1 participant