-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot handle aligned multi-space-delimited files #212
Comments
@Jolanrensen thanks for the issue, we'll look into it and report back here. |
Hi, thanks for the bug report. I'd like to suggest supporting this in a different way. It feels more natural to me for the library to support fixed-width columns, where the column widths are either specified explicitly by the caller, or inferred from the first row of the input. In this proposal we would also allow the library to trim the spaces inside the fixed-width cells, perhaps reusing the flag For example the library could read
and infer starting column positions of 1, 26, 39, 48 (in a 1-based convention, and assuming I've counted characters correctly). It would assume that the rest of the file had data at these positions. Would this work for you? I have some reluctance to support variable-length delimiters, not least because of the edge cases it introduces when there are empty cells. |
@kosak Yes! I think that would work great. I think in all cases, the maximum cell width is defined by the size of the column title (+ n.o. spaces - 1 delimiter space) (aside from the final column of course). So this would solve the problem correctly. |
This will be fixed by #220 |
Description
Let's say we have a multi-space-delimited file like:
which is a common thing to see in logs etc., I cannot seem to parse it correctly.
The delimiter can only be a
char
, which I suppose should be' '
in this case and then we could trim the rest withignoreSurroundingSpaces = true
Steps to reproduce
Parse the string above with delimiter
' '
,ignoreSurroundingSpaces = true
.Expected results
I'd expect there to be a way to ignore repetition of the delimiter
char
.Actual results
After parsing, we get something like:
Edit:
Additionally common; A single space inside a column, while multiple spaces indicates a delimiter, like:
The text was updated successfully, but these errors were encountered: