Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request: Add some way to represent trailing whitespace without actual trailing whitespace #50

Open
AndydeCleyre opened this issue Sep 11, 2024 · 12 comments

Comments

@AndydeCleyre
Copy link

AndydeCleyre commented Sep 11, 2024

Intro

Hello again!

I currently have only one niggle with the format,
and that's the invisibility and fragility of representing trailing spaces or tabs.
Trailing newlines are already solved 👍🏼.

I understand that this will probably be closed as not worth the added complexity,
but at least would like to put my best proposal forward.

As an experiment I tried implementing my last idea from the comments from #4,
and it was ok, but then I realized that the new syntax conflicted with regular dictionary syntax in some cases,
so here's the next iteration of the idea.

Why not

First I'll repeat the arguments against a change like this:

  1. added complexity
  2. it's an editor issue -- the editor should not make trailing space invisible and should not remove it

I can't argue against number 1. It is not great to add any complexity.

Number 2 is not convincing to me, because:

  • I can't control other folks' editors
  • I like my editor to trim trailing whitespace
  • I view documents with a variety of pagers and highlighters (including straight terminal output) which will not reliably show me trailing invisible characters

Proposal A

Add three new tags, which are alternate versions of the > , - , and multiline key : :

  • [>
  • [-
  • [:

Unlike their existing counterparts, each line using one of these must end with a closing |.
The text content of the line is everything after the 3-character tag, excluding the closing |.

Examples

this dict value has three trailing spaces and no leading spaces:
  [> hello   |
this value will have trailing spaces on some lines:
  [> trailing |
  > no trailing
  [> trailing again   |
[: this key has 4 trailing spaces    |
  > value
: this key has 4 but on the second line
[:     |
  > value2
[- item with 3 trailing   |
- item2

Drawbacks

  • The new tags are not the same length as the usual tags, making some consecutive text values jagged rather than perfectly aligned.
  • The choice of ([>, [:, [-) is a little strange.

Strengths

  • All currently valid NT is still valid and interpreted as before.
  • No ambiguity is introduced.
  • Each line still describes its own type without relying on context lines.
  • Spaces are still represented as spaces.
@MichalMarsalek
Copy link
Contributor

Doesn't [ collide with inline lists?

@AndydeCleyre
Copy link
Author

An inline list must always end with a closing ], so a different closing character like | avoids conflict.

@KenKundert
Copy link
Owner

What happens to trailing spaces in strings not marked with one of these special tags?

@AndydeCleyre
Copy link
Author

I was aiming here not to change the behavior of any currently valid NestedText, though I might not be opposed to a change there either.

@LewisGaul
Copy link

I currently have only one niggle with the format, and that's the invisibility and fragility of representing trailing spaces or tabs.

I'm in agreement that this isn't a great situation. I'd also agree that "it's an editor issue" doesn't hold much water for me.

I'd prefer if your proposal was a more generic "add some demarcation syntax" rather than immediately offering a syntax, because it sets the discussion off on bikeshedding and syntax concerns (e.g. #50 (comment)), rather than whether the high-level request is worth the added complexity. There are always options for syntax, while it's helpful to consider what a few options might look like, I'd suggest not focusing too much on it right now.

The complaint raised in #4 (comment) (@pierre-rouleau) was "I see it as transferring the responsibility of ensuring reliability from the implementation to the user.", which I think is a valid concern. However, would an alternative perspective be to view it as transferring some responsibility to the parsing library?

As explained at https://nestedtext.org/en/latest/schemas.html, everything is a string in nestedtext, and the expectation is that an accompanying schema be used to assign types to fields. One option would be for nestedtext parsers to be expected (or have an option) to interpret escapes in certain contexts (such as values typed as "string" via a schema), e.g. URL encoding (space is %20) or Unicode (space is \u0020) could be supported at the parser level.

@AndydeCleyre
Copy link
Author

I'd prefer if your proposal was a more generic "add some demarcation syntax" rather than immediately offering a syntax, because it sets the discussion off on bikeshedding and syntax concerns (e.g. #50 (comment)), rather than whether the high-level request is worth the added complexity.

Yeah, I'll modify the title and description to make it more general, but will keep the proposal in as "Proposal A."

I was excited to have proof that this could be done without invalidating or changing the meaning of any currently valid NT, without the new syntax introducing ambiguity, and without breaking the rule of each line describing its own type without context.

@AndydeCleyre AndydeCleyre changed the title Proposal for representing trailing whitespace without actual trailing whitespace Request: Add some way to represent trailing whitespace without actual trailing whitespace Sep 13, 2024
@AndydeCleyre
Copy link
Author

AndydeCleyre commented Sep 13, 2024

I'll note that the drawback of potentially jagged alignment could be removed by trading the markers from ([> |) to ([, <), from ([:, |) to ([, :), and ([-, |) to ([, -). I would be OK with that choice, too, but didn't make it Proposal A because I thought it would be less clear whether the closing mark should have a space before it that is not part of the string value.

Possibly I might prefer the closing markers |<, |:, and |- to that, so that the string value still "runs into the wall" without trailing syntactic whitespace padding. I would like to call this Proposal B here, with the initial examples translated as:

this dict value has three trailing spaces and no leading spaces:
  [ hello   |<
this value will have trailing spaces on some lines:
  [ trailing |<
  > no trailing
  [ trailing again   |<
[ this key has 4 trailing spaces    |:
  > value
: this key has 4 but on the second line
[     |:
  > value2
[ item with 3 trailing   |-
- item2

@KenKundert
Copy link
Owner

I have a number of issues with this proposal.

  1. In a very real sense, this ship has sailed. The decision was made early on to allow trailing whitespace. One can make an argument that trailing whitespace is rare and generally a mistake, and so NT should remove it unless the user has explicitly marked it as desirable, but doing that now would result in a serious backward compatibility issue.
  2. Adding syntax that allows the user to mark desired trailing white space while still allowing it without the mark avoids the backward compatibility issue but seems rather pointless to me. It really just becomes a helper for your editor and has no real meaning in the document itself.
  3. It gives NT the look of a complicated language that must be learned before it can be used. I can just imagine a someone new to NT opening a document and seeing those and getting turned off by 'complicated syntax'.
  4. In my experience unintended trailing spaces are generally harmless.
  5. There are cases where trailing spaces are both meaningful and routine. Adding markers to each line would then become burdensome. One of the nicest features of NT is that it dispenses with quoting, but this proposal brings it back for lines that have trailing whitespace. For example, Vim uses trailing spaces on lines to indicate the text is a contiguous paragraph that should be reformatted to fill the desired text width after edits. This results in every line except the last line of a paragraph having a meaningful trailing space or two.

@AndydeCleyre
Copy link
Author

Regarding concern 1: this issue is not asking to disallow trailing whitespace.

Regarding 2:

It really just becomes a helper for your editor and has no real meaning in the document itself.

It would also be a helper for the human eye, outside of editors, where the information that is trailing spaces is otherwise invisible.

For 3 and 5: maybe a simpler syntax for this purpose could be achieved?

For 4: this issue is about intentional trailing spaces.

@KenKundert
Copy link
Owner

KenKundert commented Sep 14, 2024

I use trailing spaces in my NestedText files in two situations.

I use NestedText heavily to hold test cases for the software I develop. Occasionally, I need test cases that contain trailing spaces. As such, I configure my editor to show trailing spaces, not to delete them. So personally, I would not use this proposal. Even if I did wish to mark the end of my strings, I would write the test software to accept and remove a terminal character I chose, such as §. I understand that I only have this option because I am writing the code that consumes the NT file. But, this case shows that that there are alternatives available that do not require modifying NestedText.

I also use NestedText to hold prose. Here I use the Vim feature where I just type the prose in one long line and Vim automatically breaks the lines to fit the width of the screen. Vim then leaves a trailing space on each line to indicate that the line breaks should be adjusted as the text is edited to keep the width of the text. Adding end of line marker in this case, as you propose, interferes with this feature. You are thinking that the end of line markers are harmless in the NT file as long as they are removed by the NestedText reader. but in this case, my most common case, the trailing spaces without end of line markers must exist in the NT file itself.

My point is the case for this proposal is weak. There are several alternate ways of showing trailing spaces on a line:

  • configure your editor to show them
  • use 'cat -E' do show them
  • modify the application to accept custom end-of-line markers

In addition, your proposal sometimes interferes with the purpose of the trailing spaces and so cannot be used, like the Vim example I described.

You started this discussion by stating that you prefer to configure your editor to automatically delete trailing spaces, but in some cases you also want to retain them. I believe that is the motivating example for this proposal. But you can solve that problem by adding a mode line in a comment that disables the deleting of trailing spaces in files where it is needed.

Your proposal changes the nature of NestedText is a fundamental way. One of the primary features of NestedText is that it does not employ quoting, This proposal is, at its heart, quoting. It would take a very strong justification for us to consider a change like this.

@AndydeCleyre
Copy link
Author

You started this discussion by stating that you prefer to configure your editor to automatically delete trailing spaces, but in some cases you also want to retain them. I believe that is the motivating example for this proposal. But you can solve that problem by adding a mode line in a comment that disables the deleting of trailing spaces in files where it is needed.

Not really, I can configure my own editor; my main motivations are:

  • I can't configure other folks' editors, who may not realize they destructively save and format such a file and pass it on
  • I often view files with pagers, highlighters, and via stdout.

So of your three listed alternatives:

  1. Already done, but it's not sufficient.
  2. Very noisy, as the relevant GNU cat and busybox cat options add characters to every single line to account for this rare case. Also insufficient, as I need to check for a similar option for each of an array of additional tools. I usually use Zsh functions which try to fallback to the best installed highlighter or pager for the purpose found on the system. And I don't think there's anything to be done about stdout (maybe at the shell or terminal emulator level?).
  3. Not always possible, as they are not always my applications, and I often use NT to view converted data from JSON, TOML, and YAML.

I understand this will not happen, but wanted to make clear that none of these suggestions fix my problems,

@KenKundert
Copy link
Owner

I acknowledge that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants