Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support skipping comments in CSV files #5758

Closed
bbannier opened this issue May 12, 2024 · 1 comment · Fixed by #5759
Closed

Support skipping comments in CSV files #5758

bbannier opened this issue May 12, 2024 · 1 comment · Fixed by #5759
Labels
arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog

Comments

@bbannier
Copy link
Member

CSV readers often support skipping comments12 which are typically full-line comments indicated by some prefix character (often # is used). It would be great if arrow-csv would allow parsing of such CSV files.

Footnotes

  1. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

  2. https://stat.ethz.ch/R-manual/R-devel/library/utils/html/read.table.html

@bbannier bbannier added the enhancement Any new improvement worthy of a entry in the changelog label May 12, 2024
bbannier added a commit to bbannier/arrow-rs that referenced this issue May 12, 2024
This patch adds reader support for a comment character for reading CSV
files. While comments like almost nothing around the CSV format are not
truly standardized, a common format supported by many CSV
readers[^1][^2] is to ignore full lines starting with a comment
character (often `#`); inline or end of line comments are not supported.

Example:

    # This is a comment in a CSV file without header.
    1,2
    # Comment inside the data block.
    11,22

The implementation of this for Arrow is pretty straight-forward as all
we need to do is expose the existing `comment` option of `csv_core` used
to read CSV files.

Closes apache#5758.

[^1]: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
[^2]: https://stat.ethz.ch/R-manual/R-devel/library/utils/html/read.table.html
bbannier added a commit to bbannier/arrow-rs that referenced this issue May 12, 2024
This patch adds reader support for a comment character for reading CSV
files. While comments like almost nothing around the CSV format are not
truly standardized, a common format supported by many CSV
readers[^1][^2] is to ignore full lines starting with a comment
character (often `#`); inline or end of line comments are not supported.

Example:

    # This is a comment in a CSV file without header.
    1,2
    # Comment inside the data block.
    11,22

The implementation of this for Arrow is pretty straight-forward as all
we need to do is expose the existing `comment` option of `csv_core` used
to read CSV files.

Closes apache#5758.

[^1]: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
[^2]: https://stat.ethz.ch/R-manual/R-devel/library/utils/html/read.table.html
tustvold pushed a commit that referenced this issue May 13, 2024
This patch adds reader support for a comment character for reading CSV
files. While comments like almost nothing around the CSV format are not
truly standardized, a common format supported by many CSV
readers[^1][^2] is to ignore full lines starting with a comment
character (often `#`); inline or end of line comments are not supported.

Example:

    # This is a comment in a CSV file without header.
    1,2
    # Comment inside the data block.
    11,22

The implementation of this for Arrow is pretty straight-forward as all
we need to do is expose the existing `comment` option of `csv_core` used
to read CSV files.

Closes #5758.

[^1]: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
[^2]: https://stat.ethz.ch/R-manual/R-devel/library/utils/html/read.table.html
@tustvold tustvold added the arrow Changes to the arrow crate label Jun 3, 2024
@tustvold
Copy link
Contributor

tustvold commented Jun 3, 2024

label_issue.py automatically added labels {'arrow'} from #5759

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants