-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose parser
on DFParser to enable user controlled parsing
#9729
Conversation
@alamb, would you be able to please give me some feedback here? In a7a7808, I made parser public and added an example, and that seemed to go well insofar as I could create a new That was good, but then it can't work with df functionality like So I guess my questions are 1) does an Extension statement seem reasonable to you? 2) and if so, any advice on the implementation to mitigate object safety issues. |
Thanks for working on this @tshauck -- it is really cool to see
The simplest strategy I can think of is to implement your own version to So something like
I don't fully understand the need for an Extension statement -- if the idea is to wrap If the idea is to avoid repetition with whatever statement_to_plan is doing, maybe we can factor out the common functionality into a module. |
Thanks for the feedback, I reverted the extension changes, and now it only has the example plus making I guess in terms of next steps, I could move this out of draft or add |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @tshauck -- this looks great 🙏 . It is really nice to get an example showing how to do this.
I left some suggestions on additional contextual comments and how to make the code potentially simpler, but I also think we could merge this PR as is.
Can you also add an entry to the README for this example?
https://github.com/apache/arrow-datafusion/tree/main/datafusion-examples#single-process
Also, I think we can make the example code simpler with less nesting with a function. Something like
impl MyParser<'_> {
...
/// Returns true if the next token is `COPY` keyword, false otherwise
fn is_copy(&self) -> bool {
matches!(
self.df_parser.parser.peek_token().token,
Token::Word(w) if w.keyword == Keyword::COPY
)
}
pub fn parse_statement(&mut self) -> Result<MyStatement, ParserError> {
if self.is_copy() {
self.df_parser.parser.next_token(); // COPY
let df_statement = self.df_parser.parse_copy()?;
if let Statement::CopyTo(s) = df_statement {
Ok(MyStatement::from(s))
} else {
Ok(MyStatement::DFStatement(Box::from(df_statement)))
}
} else {
let df_statement = self.df_parser.parse_statement()?;
Ok(MyStatement::from(df_statement))
}
}
...
#[tokio::main] | ||
async fn main() -> Result<()> { | ||
let mut my_parser = | ||
MyParser::new("COPY source_table TO 'file.fasta' STORED AS FASTA")?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😸
Co-authored-by: Andrew Lamb <[email protected]>
Co-authored-by: Andrew Lamb <[email protected]>
datafusion-examples/README.md
Outdated
- [`simple_udfw.rs`](examples/simple_udwf.rs): Define and invoke a User Defined Window Function (UDWF) | ||
- [`advanced_udwf.rs`](examples/advanced_udwf.rs): Define and invoke a more complicated User Defined Window Function (UDWF) | ||
- [`sql_dialect.rs`](examples/sql_dialect.rs): Examples of using the SQL Dialect |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's sql_dialect.rs
-- I also alphabetized the list, hope that's ok.
@alamb Thanks for all the feedback! The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -42,36 +42,37 @@ cargo run --example csv_sql | |||
|
|||
## Single Process | |||
|
|||
- [`advanced_udaf.rs`](examples/advanced_udaf.rs): Define and invoke a more complicated User Defined Aggregate Function (UDAF) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️ -- very nice drive by cleanup 🙏
Thanks again @tshauck |
…#9729) * poc: custom parser * play with extension statement * tweak * Revert "tweak" This reverts commit e57006e. * Revert "play with extension statement" This reverts commit 86588e4. * style: cargo fmt * Update datafusion-examples/examples/sql_parsing.rs Co-authored-by: Andrew Lamb <[email protected]> * Apply suggestions from code review Co-authored-by: Andrew Lamb <[email protected]> * style: cargo cmt * refactor: less nesting in parse statement * docs: better example description --------- Co-authored-by: Andrew Lamb <[email protected]>
Which issue does this PR close?
Closes #533
Rationale for this change
Being able to control the underlying parser is useful to be able to add capabilities between the SQL and logical plan layer, which can propagate downstream in the plan.
This approach seems to work nicely insofar as modifying how the SQL becomes a user-defined Statement. I.e. here I'm wrapping DF's
Statement
with my own, and creating it based on the underlying result fromparse_copy
.However, I'm unsure on the next steps. E.g. with this strategy of wrapping the statement, I can't then use DF's machinery to generate the
LogicalPlan
, like usingsql_statement_to_plan
. It's not too bad in the basic case, as it'd be easy to generate a user defined logical node from this enum (or fall back tosql_statement_to_plan
), but in more complex cases there's probably a better way, e.g. maybe to makeStatement
a Trait with a visit method, or something along those lines? There's probably a better/simpler approach :).Going to open this in draft, and will come back later today/tomorrow to see what the basic strategy for constructing a LogicalPlan would look like and see if that's a reasonable first chunk.
Open to feedback on any of it, thanks!
What changes are included in this PR?
Makes the parser on DFParser public and adds an example.
Are these changes tested?
Manually ran the example. Seems to perform as expected. Though not sure what the best path is next.
Are there any user-facing changes?
Not breaking, but
parser
is available onDFParser
.