-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for reading CSV files with comments #10467
Conversation
b1092d8
to
62b8364
Compare
This is currently a sketch for a possible implementation for #10262. The approach taken push interpretation of comment lines into If this is a viable solution it would require a bump of at least the
@alamb, would you be open to shepherding this PR and apache/arrow-rs#5759, or alternatively could help identify someone who could? |
fb58860
to
1df527d
Compare
Yes. FWIW DataFusion typically upgrades to the latest arrow-rs (including arrow-csv) dependency so while extra time would be needed no extra work would be |
d4faa11
to
f27f2dc
Compare
This is now rebased on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for this contribution @bbannier -- this code looks great. The only thing I think this PR now needs is some test coverage so we don't break it in the future
Here is my suggestion for testing:
- update csv_files.slt, see
this file
for info on running sql logic tests
Note I think you can programatically create a csv file with a command like
> copy (values ('column1,column2'), ('#second line is a comment'), ('2,3')) TO '/tmp/my.csv' OPTIONS ('format.delimiter' '|');
+-------+
| count |
+-------+
| 3 |
+-------+
1 row(s) fetched.
Elapsed 0.004 seconds.
That results in
$ cat /tmp/my.csv
column1,column2
#second line is a comment
2,3
This patch adds support for parsing CSV files containing comment lines. Closes apache#10262.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @bbannier 🚀
'format.delimiter' ','); | ||
|
||
query TT | ||
SELECT * from stored_table_with_comments; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Love it
This patch adds support for parsing CSV files containing comment lines. Closes apache#10262.
This PR adds support for parsing CSV files containing comment lines.
Closes #10262.