Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce UserDefinedLogicalNodeUnparser for User-defined Logical Plan unparsing #13880

Merged
merged 12 commits into from
Dec 25, 2024

Conversation

goldmedal
Copy link
Contributor

@goldmedal goldmedal commented Dec 22, 2024

Which issue does this PR close?

Closes #13753 .

Rationale for this change

See the previous discussion for the design: #13753 (comment)

UserDefinedLogicalNodeUnparser provides two APIs for user-defined behavior:

  • unparse: Unparse the custom logical node to SQL within a statement.
  • unparse_to_statement: Unparse the custom logical node to a statement.

What changes are included in this PR?

  • Introduce UserDefinedLogicalNodeUnparser
  • Make the AST builders public
  • Add examples for unparsing custom logical plan

Are these changes tested?

yes

Are there any user-facing changes?

New trait and API

@goldmedal goldmedal marked this pull request as ready for review December 22, 2024 12:23
@goldmedal
Copy link
Contributor Author

Could @phillipleblanc or @sgrebnov take a look at this? Thanks

/// The child unparsers are called iteratively.
/// There are two methods in [`Unparser`] will be called:
/// - `extension_to_statement`: This method is called when the custom logical node is a custom statement.
/// If multiple child unparsers return a non-None value, the last unparsing result will be returned.
Copy link
Member

@sgrebnov sgrebnov Dec 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@goldmedal - I'm not sure using the last unparsing result is the expected behavior. As a user, I would expect to get the result from the first udlp_unparser that supports this node and stop checking the remaining udlp_unparsers instead.

Is there a specific use case / reason for using the last supported udlp_unparser? They can be dynamically registered and the last one should override perviously registered? To match unparse behavior where we don't know/track if unparsing is applied so we always apply all?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I would also expect this to short-circuit and have the first one win.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a specific use case / reason for using the last supported udlp_unparser? They can be dynamically registered and the last one should override perviously registered.

Actually, I don't have a real case for it but yes, I imagined the user can append the new unparser to override the previous.
However, I guess it could be a rare use case (?

As a user, I would expect to get the result from the first udlp_unparser that supports this node and stop checking the remaining udlp_unparsers instead.

Maybe you guys are right. We should make the first one win. It's more efficient and simpler.
It's also how ExprPlanner worked in the planner.

for planner in self.context_provider.get_expr_planners() {
match planner.plan_extract(extract_args)? {
PlannerResult::Planned(expr) => return Ok(expr),
PlannerResult::Original(args) => {
extract_args = args;
}
}
}

Anyway, I'll change it. Thanks!

select: &mut Option<&mut SelectBuilder>,
relation: &mut Option<&mut RelationBuilder>,
) -> Result<()> {
for unparser in &self.udlp_unparsers {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be good to add indication that unparse applied to be consistent with unparse_to_statement and throw error if non of registered udlps applied / successfully processed the node.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, if none of the extension unparsers were able to process it, it should throw an error IMO

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sound greats. I think we could provide some enum for it like

UnparsedResult::Unparsed()
UnparsedResult::Original()
..

I'll try it

@sgrebnov
Copy link
Member

@goldmedal - looks great, two minor questions/comments

Copy link
Contributor

@phillipleblanc phillipleblanc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @goldmedal! I have a few minor comments, but this this a good upgrade for the unparser!

/// If multiple child unparsers return a non-None value, the last unparsing result will be returned.
/// - `extension_to_sql`: This method is called when the custom logical node is part of a statement.
/// If multiple child unparsers are registered for the same custom logical node, all of them will be called in order.
pub fn with_udlp_unparsers(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a fan of this name - udlp takes effort to understand what it means. How about renaming udlp_* to extension_*? i.e. with_extension_unparsers. It conveys the same meaning in an easier to understand way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds great. I'll rename it.

/// The child unparsers are called iteratively.
/// There are two methods in [`Unparser`] will be called:
/// - `extension_to_statement`: This method is called when the custom logical node is a custom statement.
/// If multiple child unparsers return a non-None value, the last unparsing result will be returned.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I would also expect this to short-circuit and have the first one win.

/// - `extension_to_statement`: This method is called when the custom logical node is a custom statement.
/// If multiple child unparsers return a non-None value, the last unparsing result will be returned.
/// - `extension_to_sql`: This method is called when the custom logical node is part of a statement.
/// If multiple child unparsers are registered for the same custom logical node, all of them will be called in order.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should also short-circuit and only do the first one?

select: &mut Option<&mut SelectBuilder>,
relation: &mut Option<&mut RelationBuilder>,
) -> Result<()> {
for unparser in &self.udlp_unparsers {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, if none of the extension unparsers were able to process it, it should throw an error IMO

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also rename this file to extension_unparser.rs

if let Some(err) = plan_to_sql(&plan).err() {
assert_eq!(
err.to_string(),
"External error: `relation` must be initialized"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error is expected?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the original error for unsupported relation plan 🤔 I guess it could be improved by #13880 (comment)

@goldmedal goldmedal marked this pull request as draft December 23, 2024 09:14
@goldmedal goldmedal marked this pull request as ready for review December 23, 2024 12:01
Copy link
Contributor

@phillipleblanc phillipleblanc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

/// The result of unparsing a custom logical node.
pub enum UnparseResult {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use two separate enums here since one variant can never be emitted in one of the functions, and vice-versa - better type safety, and removes one arm of the match clause that just returns an error anyway.

Also perhaps Unmodified or Unmatched instead of Original? It's not immediately obvious what an UnparseResult::Original means, but that is quite minor - also fine to leave as is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds great. It can avoid the potential user error. I did it in 7b6c37f. Thanks!

Copy link
Contributor

@phillipleblanc phillipleblanc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a very brief look at this PR and it looks really nice -- thank you @goldmedal

I suspect part of why it looks great is due to the reviews from @phillipleblanc and @sgrebnov -- thank you both as well for helping to keep the code flowing

/// This example demonstrates how to unparse a custom logical plan as a statement.
/// The custom logical plan is a simple extension of the logical plan that reads from a parquet file.
/// It can be unparse as a statement that reads from the same parquet file.
async fn unparse_my_logical_plan_as_statement() -> Result<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is very cool

@alamb
Copy link
Contributor

alamb commented Dec 25, 2024

Screenshot 2024-12-25 at 7 18 25 AM

I merged up this PR to resolve a conflict

@goldmedal
Copy link
Contributor Author

Screenshot 2024-12-25 at 7 18 25 AM I merged up this PR to resolve a conflict

Thanks!

@goldmedal goldmedal merged commit 482b489 into apache:main Dec 25, 2024
24 checks passed
@goldmedal
Copy link
Contributor Author

Thanks @phillipleblanc , @sgrebnov and @alamb for the reviews 🙇

@goldmedal goldmedal deleted the feature/13753-udlp-unparser branch December 25, 2024 14:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sql SQL Planner
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support unparsing LogicalPlan::Extension to SQL tesxt
4 participants