-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Fix COPY TO does not produce an output file for the empty set #18074
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix COPY TO does not produce an output file for the empty set #18074
Conversation
}; | ||
|
||
// Single-file output requires creating at least one file stream in advance. | ||
// If no record batches are present in the input stream (zero-row scenario), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there might a bit missing from this comment. Sentence doesn't seem complete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rephrased it. Last sentence was indeed not complete.
09173f8
to
373f969
Compare
|
…ed test to validate schema is correct)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @bert-beyondloops 🙏
The behavior of writing empty files has come up several times before and I worry we will go back and forth on the implementation if we don't have the expected results written down somewhere
Could you also please add some documentation to the top of this module / the demuxer tasks to make it clearer what behavior is expected?
I've added documentation clarifying the minimum number of files written for each output scenario. |
1f4be91
to
4285d9d
Compare
4285d9d
to
b21ac51
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @bert-beyondloops
/// be written with the extension from the path. Otherwise the default extension | ||
/// will be used and the output will be split into multiple files. | ||
/// | ||
/// Output file guarantees: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Which issue does this PR close?
COPY TO does not produce a single output file for an empty set
Rationale for this change
Executing following sql does not effectively create a single output file on disk :
COPY (SELECT 1 AS id WHERE FALSE) TO 'table_no_rows.parquet';
I would expect it creates a parquet file containing 0 rows including the schema metadata.
The fact you can still query the schema of such a table is still valuable information.
What changes are included in this PR?
Are these changes tested?
Additional COPY TO test added in the copy.slt sqllogictests
Are there any user-facing changes?
A file containing 0 rows will be created now