Cannot handle aligned multi-space-delimited files #212

Jolanrensen · 2024-10-22T12:14:17Z

Description
Let's say we have a multi-space-delimited file like:

NAME                     STATUS   AGE      LABELS
argo-events              Active   2y77d    app.kubernetes.io/instance=argo-events,kubernetes.io/metadata.name=argo-events
argo-workflows           Active   2y77d    app.kubernetes.io/instance=argo-workflows,kubernetes.io/metadata.name=argo-workflows
argocd                   Active   5y18d    kubernetes.io/metadata.name=argocd
beta                     Active   4y235d   kubernetes.io/metadata.name=beta

which is a common thing to see in logs etc., I cannot seem to parse it correctly.
The delimiter can only be a char, which I suppose should be ' ' in this case and then we could trim the rest with ignoreSurroundingSpaces = true

Steps to reproduce

Parse the string above with delimiter ' ', ignoreSurroundingSpaces = true.

Expected results

I'd expect there to be a way to ignore repetition of the delimiter char.

Actual results

After parsing, we get something like:

⌌---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------⌍
|  |           NAME| untitled| 1| 2| 3| 4| 5| 6| 7| 8| 9|     10| 11| 12|     13| 14| 15|    16|                                       17|     18|   19|                                   STATUS|    20|   21|    AGE|   22|                                 23|                               24|   25|   26| LABELS|
|--|---------------|---------|--|--|--|--|--|--|--|--|--|-------|---|---|-------|---|---|------|-----------------------------------------|-------|-----|-----------------------------------------|------|-----|-------|-----|-----------------------------------|---------------------------------|-----|-----|-------|
| 0|    argo-events|         |  |  |  |  |  |  |  |  |  |       |   |   | Active|   |   | 2y77d|                                         |       |     | app.kubernetes.io/instance=argo-event...|  null| null|   null| null|                               null|                             null| null| null|   null|
| 1| argo-workflows|         |  |  |  |  |  |  |  |  |  | Active|   |   |  2y77d|   |   |      | app.kubernetes.io/instance=argo-workf...|   null| null|                                     null|  null| null|   null| null|                               null|                             null| null| null|   null|
| 2|         argocd|         |  |  |  |  |  |  |  |  |  |       |   |   |       |   |   |      |                                         | Active|     |                                         | 5y18d|     |       |     | kubernetes.io/metadata.name=argocd|                             null| null| null|   null|
| 3|           beta|         |  |  |  |  |  |  |  |  |  |       |   |   |       |   |   |      |                                         |       |     |                                   Active|      |     | 4y235d|     |                                   | kubernetes.io/metadata.name=beta| null| null|   null|
⌎---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------⌏

Edit:

Additionally common; A single space inside a column, while multiple spaces indicates a delimiter, like:

NAME                     STATUS       AGE      LABELS
argo-events              Not Active   2y77d    app.kubernetes.io/instance=argo-events,kubernetes.io/metadata.name=argo-events
argo-workflows           Active       2y77d    app.kubernetes.io/instance=argo-workflows,kubernetes.io/metadata.name=argo-workflows
argocd                   Active       5y18d    kubernetes.io/metadata.name=argocd
beta                     Not Active   4y235d   kubernetes.io/metadata.name=beta

The text was updated successfully, but these errors were encountered:

devinrsmith · 2024-10-22T14:44:18Z

@Jolanrensen thanks for the issue, we'll look into it and report back here.

kosak · 2024-10-26T03:30:32Z

Hi, thanks for the bug report. I'd like to suggest supporting this in a different way.

It feels more natural to me for the library to support fixed-width columns, where the column widths are either specified explicitly by the caller, or inferred from the first row of the input. In this proposal we would also allow the library to trim the spaces inside the fixed-width cells, perhaps reusing the flag ignoreSurroundingSpaces.

For example the library could read

NAME                     STATUS       AGE      LABELS

and infer starting column positions of 1, 26, 39, 48 (in a 1-based convention, and assuming I've counted characters correctly). It would assume that the rest of the file had data at these positions.

Would this work for you? I have some reluctance to support variable-length delimiters, not least because of the edge cases it introduces when there are empty cells.

Jolanrensen · 2024-10-28T09:48:56Z

@kosak Yes! I think that would work great. I think in all cases, the maximum cell width is defined by the size of the column title (+ n.o. spaces - 1 delimiter space) (aside from the final column of course). So this would solve the problem correctly.

devinrsmith · 2024-11-05T20:15:15Z

This will be fixed by #220

Jolanrensen added the bug Something isn't working label Oct 22, 2024

This was referenced Oct 22, 2024

readDelimiter variant for Regex as delimiter Kotlin/dataframe#746

Closed

☂ CSV rework Kotlin/dataframe#827

Open

Jolanrensen changed the title ~~Cannot handle aligned space-delimited files~~ Cannot handle aligned multi-space-delimited files Oct 22, 2024

devinrsmith assigned kosak Oct 22, 2024

This was referenced Nov 4, 2024

Refactor to prepare for fixed-width column support. #219

Merged

Add fixed-width column support #220

Merged

kosak closed this as completed in #219 Nov 5, 2024

devinrsmith reopened this Nov 5, 2024

devinrsmith closed this as completed in #220 Nov 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot handle aligned multi-space-delimited files #212

Cannot handle aligned multi-space-delimited files #212

Jolanrensen commented Oct 22, 2024 •

edited

Loading

devinrsmith commented Oct 22, 2024

kosak commented Oct 26, 2024

Jolanrensen commented Oct 28, 2024

devinrsmith commented Nov 5, 2024

Cannot handle aligned multi-space-delimited files #212

Cannot handle aligned multi-space-delimited files #212

Comments

Jolanrensen commented Oct 22, 2024 • edited Loading

devinrsmith commented Oct 22, 2024

kosak commented Oct 26, 2024

Jolanrensen commented Oct 28, 2024

devinrsmith commented Nov 5, 2024

Jolanrensen commented Oct 22, 2024 •

edited

Loading