-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Defer ListingTable Path Creation To Write #8007
Conversation
/// If you wish to specify a path that does not exist on the local | ||
/// machine you must provide it as a fully-qualified [file URI] | ||
/// e.g. `file:///myfile.txt` | ||
/// Otherwise, the path will be resolved to an absolute path, and converted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the change that is necessary to accomodate this, we need to permit ListingTableUrl to be created for paths that don't currently exist
@@ -250,7 +245,6 @@ single_file_test(a bigint, b bigint) | |||
STORED AS csv | |||
LOCATION 'test_files/scratch/insert_to_external/single_csv_table.csv' | |||
OPTIONS( | |||
create_local_path 'true', | |||
single_file 'true', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I intend to replace single_file with inference based on the ListingTableUrl, i.e. if it ends with a path separator, as a follow on PR
@@ -214,7 +178,12 @@ impl ListingTableUrl { | |||
} | |||
} | |||
}, | |||
false => futures::stream::once(store.head(&self.prefix)).boxed(), | |||
false => futures::stream::once(store.head(&self.prefix)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes the behaviour consistent with listing above
@@ -855,14 +855,17 @@ impl TableProvider for ListingTable { | |||
let input_partitions = input.output_partitioning().partition_count(); | |||
let writer_mode = match self.options.insert_mode { | |||
ListingTableInsertMode::AppendToFile => { | |||
if input_partitions > file_groups.len() { | |||
if file_groups.is_empty() && self.options.single_file { | |||
// This is a hack, longer term append should be split out (#7994) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this we get an error if this is the first write as there are no file_groups. Longer-term we shouldn't be conflating ListingTable IO and that of streaming use-cases
3efbe44
to
da0554d
Compare
@@ -46,37 +46,54 @@ pub struct ListingTableUrl { | |||
impl ListingTableUrl { | |||
/// Parse a provided string as a `ListingTableUrl` | |||
/// | |||
/// A URL can either refer to a single object, or a collection of objects with a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't changing the pre-existing behaviour but attempting to better document it
da0554d
to
72778b1
Compare
/// If you wish to specify a path that does not exist on the local | ||
/// machine you must provide it as a fully-qualified [file URI] | ||
/// e.g. `file:///myfile.txt` | ||
/// If the path already exists in the local filesystem this will be used to determine if this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is necessary to preserve the pre-existing behaviour where it would use the filesystem to determine if the path refers to a directory or not
@@ -362,8 +352,7 @@ statement ok | |||
CREATE EXTERNAL TABLE | |||
table_without_values(field1 BIGINT NULL, field2 BIGINT NULL) | |||
STORED AS parquet | |||
LOCATION 'test_files/scratch/insert_to_external/external_parquet_table_q2' | |||
OPTIONS (create_local_path 'true'); | |||
LOCATION 'test_files/scratch/insert_to_external/external_parquet_table_q2/'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As these paths don't exist anymore, we need to use the trailing /
to indicate that we intend for a directory, as opposed to a file
I'm not sure if this counts as an API change, I defer to the reviewers |
| Option | Description | Default Value | | ||
| ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------- | | ||
| SINGLE_FILE | If true, indicates that this external table is backed by a single file. INSERT INTO queries will append to this file. | false | | ||
| CREATE_LOCAL_PATH | If true, the folder or file backing this table will be created on the local file system if it does not already exist when running INSERT INTO queries. | false | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This behaviour is now effectively enabled by default
72778b1
to
b96615a
Compare
b96615a
to
7edff96
Compare
} else { | ||
return Err(err); | ||
} | ||
create_external_table_test(location, &sql).await.unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This no longer returns an error, making it consistent with the other stores
Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days. |
Which issue does this PR close?
Part of #7994
Rationale for this change
This is part of the general ticket of making the file IO story more coherent and less surprising.
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?