Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support skipping comments in CsvReader #10262

Closed
bbannier opened this issue Apr 27, 2024 · 2 comments · Fixed by #10467
Closed

Support skipping comments in CsvReader #10262

bbannier opened this issue Apr 27, 2024 · 2 comments · Fixed by #10467
Assignees

Comments

@bbannier
Copy link
Member

bbannier commented Apr 27, 2024

It would be great if datafusion had out-of-the-box support for skipping comment lines. While non of this is "standardized" many CSV readers support skipping full comment lines. An often used comment indentifier is a # prefix (default in e.g. pandas or R).

I originally posted this as a comment #8824 (comment).

@pingsutw
Copy link
Member

take

@bbannier
Copy link
Member Author

I opened apache/arrow-rs#5759 to add comment support to Arrow's CSV reader. With that the work here is mostly around passing that flag from user code to the actual reader, and implementing support for the flag to be serialized in protos.

bbannier added a commit to bbannier/datafusion that referenced this issue May 12, 2024
This patch adds support for parsing CSV files containing comment lines.

Closes apache#10262.
bbannier added a commit to bbannier/datafusion that referenced this issue May 12, 2024
This patch adds support for parsing CSV files containing comment lines.

Closes apache#10262.
bbannier added a commit to bbannier/datafusion that referenced this issue May 12, 2024
This patch adds support for parsing CSV files containing comment lines.

Closes apache#10262.
bbannier added a commit to bbannier/datafusion that referenced this issue May 12, 2024
This patch adds support for parsing CSV files containing comment lines.

Closes apache#10262.
bbannier added a commit to bbannier/datafusion that referenced this issue May 12, 2024
This patch adds support for parsing CSV files containing comment lines.

Closes apache#10262.
bbannier added a commit to bbannier/datafusion that referenced this issue May 13, 2024
This patch adds support for parsing CSV files containing comment lines.

Closes apache#10262.
bbannier added a commit to bbannier/datafusion that referenced this issue May 13, 2024
This patch adds support for parsing CSV files containing comment lines.

Closes apache#10262.
bbannier added a commit to bbannier/datafusion that referenced this issue Jun 9, 2024
This patch adds support for parsing CSV files containing comment lines.

Closes apache#10262.
bbannier added a commit to bbannier/datafusion that referenced this issue Jun 9, 2024
This patch adds support for parsing CSV files containing comment lines.

Closes apache#10262.
bbannier added a commit to bbannier/datafusion that referenced this issue Jun 10, 2024
This patch adds support for parsing CSV files containing comment lines.

Closes apache#10262.
alamb pushed a commit that referenced this issue Jun 10, 2024
This patch adds support for parsing CSV files containing comment lines.

Closes #10262.
findepi pushed a commit to findepi/datafusion that referenced this issue Jul 16, 2024
This patch adds support for parsing CSV files containing comment lines.

Closes apache#10262.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants