Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move information_schema to datafusion-catalog #14364

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

logan-keede
Copy link
Contributor

@logan-keede logan-keede commented Jan 29, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added core Core DataFusion crate substrait catalog Related to the catalog crate labels Jan 29, 2025
@logan-keede
Copy link
Contributor Author

cc @comphead @alamb

@alamb
Copy link
Contributor

alamb commented Jan 29, 2025

I am checking this one out

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @logan-keede -- this is great 🙏

I also took the liberty of pushing some commits to remove a few unecessary explciit dependencies as I thought that would be better than trying to explain how to do it

If you are feeling like some more refactoring projects, any chance you are interested in working to split out the data sources (aka make datafusion-datasource-parquet, datafusion-datasource-csv, etc)?

This doesn't have a ticket yet (as we were still working to extract the catalog and physical optimizers, but now that those are done, I think we are ready to try to break out the listing table / data sources)

@@ -46,6 +46,7 @@ url = { workspace = true }

[dev-dependencies]
datafusion = { workspace = true, features = ["nested_expressions"] }
datafusion-catalog = { workspace = true }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think since datafusion re-exports datafusion_catalog we should be able to avoid this dependency:

https://github.com/apache/datafusion/blob/9ca6c2557488c1c608182542c1be96889b64fe29/datafusion/core/src/lib.rs#L731-L730

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I knew there was some chance of removing this dependency, but not sure on how to go about it. So, thanks.

/// assert_eq!(ctes[0].to_string(), "my_cte");
/// ```
pub fn resolve_table_references(
statement: &datafusion_sql::parser::Statement,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think since this function belongs in datafusion-sql as it is walking over the sql parse tree

perhaps we can put it in https://github.com/apache/datafusion/tree/main/datafusion/sql/src/resolve.rs or something and then remove the dependency of datafusion-catalog on datafusion-sql

I think we could do this in a follow on PR as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's do that then since I might not be able to commit over the next few days due to some personal commitments.

@logan-keede
Copy link
Contributor Author

If you are feeling like some more refactoring projects, any chance you are interested in working to split out the data sources (aka make datafusion-datasource-parquet, datafusion-datasource-csv, etc)?

Well, I guess I know what I might be doing when I get back. I will definitely look into it, please let me know if there have been some discussion(or any relevant resource) on it in the past if that's not too much trouble.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
catalog Related to the catalog crate core Core DataFusion crate substrait
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Epic] Extract catalog functionality from the core to make it more modular
2 participants