Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't Canonicalize Filesystem Paths in ListingTableUrl / support new external tables for files that do not (yet) exist #8014

Merged
merged 1 commit into from
Nov 20, 2023

Conversation

tustvold
Copy link
Contributor

@tustvold tustvold commented Oct 31, 2023

Which issue does this PR close?

Closes #8277

Rationale for this change

Split out from #8007 and building on top of #8012. Instead of canonicalizing the path before conversion to a URL, it instead concatenates the working directory and relies on the URL crate's built in path resolution logic to reconcile relative paths. This allows creating ListingTableUrl to paths that don't currently exist.

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Users can now create external tables referencing path that don't (yet) exist

@github-actions github-actions bot added the core Core DataFusion crate label Oct 31, 2023
@tustvold tustvold changed the title Listing table canonicalize Don't Canonicalize Filesystem Paths in ListingTableUrl Oct 31, 2023
@tustvold
Copy link
Contributor Author

tustvold commented Nov 1, 2023

This is currently blocked on apache/arrow-rs#5019

@tustvold tustvold force-pushed the listing-table-canonicalize branch from 62d4c9b to 3ca6aaa Compare November 18, 2023 20:27
@tustvold tustvold force-pushed the listing-table-canonicalize branch from 3ca6aaa to 097131c Compare November 18, 2023 20:34
@@ -388,15 +388,7 @@ mod tests {
// Ensure that local files are also registered
let sql =
format!("CREATE EXTERNAL TABLE test STORED AS PARQUET LOCATION '{location}'");
let err = create_external_table_test(location, &sql)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This no longer generates an error as we don't require that the path exists

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I swear there is a ticket that tracks this, but I couldn't find it. Thus I filed #8277 and updated this PR to close it

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @tustvold -- this change looks great and I tested it locally as well as wrote an end to end test here: #8278

I also updated the description of this PR with a description of the user facing change and filed an issue.

A very nice improvement

@alamb alamb changed the title Don't Canonicalize Filesystem Paths in ListingTableUrl Don't Canonicalize Filesystem Paths in ListingTableUrl / support new external tables for files that do not (yet) exist Nov 20, 2023
@tustvold tustvold merged commit 53d5df2 into apache:main Nov 20, 2023
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow creating external tables with directories that don't exist
2 participants