-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tagless multiline strings #49
Comments
Michal,
The approach we took (using the string tag '>') does not suffer from any of these problems, and also allows us to support blank lines as comments within a multi-line string. For example:
As you say, your current proposal does not conflict with the existing syntax, so it would be possible to support both approaches. But we are really trying to keep the language small and simple, and having multiple ways of doing the same thing conflicts with that goal. Thank you for your suggestion. I understand you desire to find an alternative to the current key-based approach to multiline strings. It seems artificial to me too. But despite spending considerable time on it, I have yet to come up with anything that provides one simple solution to all of the problems I considered. -Ken |
Ken,
Well there are already multiple ways to do the same things. And that's fine. One way that's really short and elegant and works for most cases and another that's a bit more verbose, but works even for all the edge cases (like is the case for dictionaries). When I was thinking about the necessary changes to the (Javascript) parser, I came to the conclusion that it would be reasonably simple.
The fact that the Pretty printing section even exists in the docs imho really supports my case. The multiline strings serialization is the only thing that is not pretty in NestedText. |
I just realised that empty/comment lines are still possible inside a tagless multiline string, it just has to be indented less than the string:
That being sad I would maybe find this form a bit confusing. |
I see that there's no .Net library linked. I can write and maintain one. Maybe this weekend. I will probably include this proposal as a toggleable option to experiment with it a bit and see how it works. If my proposal will not end up getting added to the standard, my library will of course by default work as per the standard (not using to emit, throw on parse), but I will be able to use it with this proposal for my personal projects by switching a boolean flag. |
Just because you can, doesn’t mean you should :) From https://nestedtext.org/en/latest/file_format.html:
A fundamental part of the design is that you can tell what type of line you're looking at without the context of any of the surrounding lines. This holds true whatever you have in your string, e.g. a line: |
Yes, I agree, as I said I find this confusing.
Sure, I am aware of this, I even mention this in my comment. It's common for languages to have parser modes. Once you enter a multiline string, some things behave slightly differently and that's fine. I don't think my proposal would make it harder for humans to read as it strives to be very intuitive. Also, I'm not sure it is that fundamental. Sure it holds (perhaps as a consequence of making sure no escaping is needed) and is kinda convenient, but making an exception seems like nothing fundamental would break in the long run. Well, certainly it is less fundamental than "no scalar types except strings supported" or "no escaping needed". |
Perhaps I am not being clear. So let me take a different approach. Consider this example.
Is the blank line that follows line 2 of value 1 included in the value or not? In other words, what is the value of key 1? Is it ...
Or is it ...
As a second question, you say that the indentation of the subsequent lines need not match the indent level of the first line. So this is valid:
Okay, if that is valid, then how does one represent an out-dented paragraph? Consider this example:
Does this represent:
or does it represent:
|
|
But didn't you say
Doesn't that contradict your # 1 response? Isn't that what I did with this example:
|
I'm sorry what exactly are you referring to? What are the two statements that are contradictory? |
This is a really difficult conversation because your proposal is so squishy and poorly defined. That is why I am trying to communicate with concrete examples. Allow me to try again. You gave the following example above:
What does that actually represent? It could be:
Or it could be:
Which is it? And why should it be one rather than the other? It seems ambiguous to me. |
Communicating with concrete examples is fine by me, but if you prefer, I can try to write a formal spec instead, although I believe most questions should be clarified by the pseudocode I provided in my comment.
just as
is the equivalent of
Anything else would violate
Especially your other option for its meaning
doesn't make any sense to me. The spaces just denote a block (indentation). How could they be part of the resulting value? That would imply
is equivalent to
which is just some ambiguous nonsense. |
Okay, so how do you represent the following?
|
None of those are representable using the proposed syntax, so you'd fallback to the general |
You already have two ways to represent strings:
and
My proposal is to take the one that cannot represent every string (the first one) and make it able to represent some more kinds of strings by allowing the value to continue on the next line (but of course it won't be every string). Both in the current version and in a version where my proposal is implemented, the following statements hold:
So these two points shouldn't be reasons for rejecting the idea. This is different than #34 which (apart from all 3 suggested syntaxes being ambiguous) suggested a third syntax. My proposal just extends one of the syntaxes by making use of a syntax which is currently illegal (a nested block following a list/dictionary item) and thus has no assigned meaning. |
Okay, I think I understand. Let me repeat back what I understand so that you can confirm that I have the idea. But, be aware that I have generalized it a bit. You propose enhancing end-of-line strings to allow them to extend beyond one line. Something like this:
More specifically, if the line after an end-of-line string is indented, it is considered a continuation of the string. The beginning of the second line sets the indentation level for subsequent lines and the value continues until the indentation is abandoned. So the following is valid:
The value in both of these examples resolves to:
Empty lines in the value are indistinguishable from blank lines that are simply ignored by NestedText, so we have to eliminate one or the other. Conceivably it would be possible to disable the discarding of blank lines by NestedText within or adjacent to continuations. Alternately, we can simply disallow empty lines in values rendered with continuations, in which case, we have to decide by fiat whether such strings get a terminal newline. Tag recognition is disabled after encountering an indent after an end-of-line string (this requires recognizing an end-of-line string, so the first line of the value must not be an empty line). So, tags contained in continuation strings do not present a problem. The level of indentation is set by the first non-empty continuation line. Subsequent lines may be further indented, indicating an indentation in the value itself. |
You have the idea, but:
Yes, but if a line with decreased indentation is encountered that would otherwise be ignored (blank/comment) it is still ignored and doesn't terminate the value.
Out of these two I specifically propose the former. As long the lines following the value have matching indentation (equal or larger), you disable every line type detection. This includes tags, but also detection of empty/comment lines.
I'm not 100% sure about this, but I'd say it's rather set by the first continuation line with increased indentation (can be empty). The set of representable strings is: all strings such that:
There's also an option to restrict the set of supported strings to values such that each line is nonempty and starts with a nonspace, although I don't really find this necessary. |
Ignoring feasibility I just think this is a very bad idea for readability (and simplicity of parsing). The rules for how these strings are parsed are very non-obvious. |
The added complexity for the parser is imo not significant. The added complexity for humans is bigger. This can be vastly reduced by using the restricted version:
In this version everything that's visually connected (no empty lines, strictly equal indentation) is part of the string, everything else (empty lines, different indentation) is not. Would you be ok with this version? |
To be honest I'm -1 on the whole proposal, personally I don't see the benefit and I think it has multiple significant downsides, where complexity is a major one for a language that has simplicity as an explicit goal. |
Can you elaborate on the multiple significant downsides? The only downside (which is far from significant imo - but of course that's subjective) I see mentioned in this thread is the inability to parse each line individually.
to someone (especially a nonprogrammer or a person taht doesn not know NestedText) the meaning will be obvious to them (3 text items, some of them are multiline). It's very familiar syntax that just looks as if someone was taking notes in plaintext. |
Here's another example that becomes far harder to guess what it means with your proposal:
|
This is just a matter of how you frame it. Me calling it "special case + general case" or you calling it "general case that's not really general + fully general case" does not change what it is.
There's no ambiguity in the final product. I admint that my original description was not very precise, but I addressed the questions in the meantime.
This is out of scope. My propsal only covers continuations of rest-of-line list&dictionary values. |
I feel like you're missing my point - I'm explaining my reasons for being -1 on the proposal. Even if you don't agree with my perspective on the points I made, they're still my reasons :)
I didn't mean "inability to specify how to parse", I meant lack of clarity for end users trying to read nestedtext files. My example snippet was serving to emphasise that point. |
I am not missing your point - I recognise that these are your personal reasons and that you can think whatever you want. :)
Ok, but you provided an example that is irrelevant to the proposal, which leads me to thinking that my proposal is still not clear. |
No hard feelings, don't worry :)
This is where I feel like we're not really on the same page. It very much is relevant to the proposal - any time you add complexity (e.g. supporting a new representation) you have to think about existing representations that it takes away clarity from. To be more explicit about my example and emphasise some more cases there is no obvious "correct behaviour" (which is a sign of complexity and potential confusion for users)... You're proposing allowing the following:
Would you allow this?
This is already a lot less visually clear, a user could reasonably think it's just dodgy indentation that actually represents:
What about this, should it be allowed? (even if you have the answer, it wouldn't be clear to the user)
This gets even more unclear if you allow something like:
What about this, is it valid? A user might reasonably assume so... but it's not very clear to read. Does the blank line matter? What if there are multiple blank lines?
The point is that once you allow a new syntax then it makes surrounding syntax less clear, making it harder for users to guess what's valid and/or how to interpret a file. Given this is valid:
It's a good thing that this is invalid:
The two have very different meaning, and the syntax should make that clear. |
Based on past experience I usually worry about that a lot. Online communication is hard. Fortunately we have emoticons. :) I understand your point now, thank you for explaining with that set of examples. |
I must say that I ended up liking this proposal more than I initially expected to. However, I am not currently inclined to add it to NestedText. There are two things I like about it. First, the format feels very natural. When I take notes for myself I often use a very similar style. Second, it makes NestedText a bit more compact by allowing one to start a multiline string on the same line as a dict key or list tag. On the negative side is the non-obvious issues with white space. I think that adds significant complexity for the user. Specifically, leading, trailing, and internal empty lines all can cause issues. In addition, there is a weird constraint on indentation. This constraint prevents out-denting, a common idiom. For example, it is not possible to represent the following value:
It also creates new ways for people to make mistakes that are not flagged as errors. For example, take any dict or list item that contains nested data and add a space or two to its tag and it changes the nested data into a long string. Consider the following (␣ is used to represent trailing spaces):
This is an error in the current version of NestedText because key1 appears to have two values, an end-of-line string " ", and a nested dictionary. But with the current proposal this becomes valid and the value of key1 is a multiline string rather than a dictionary:
Even without resorting to trailing spaces, Lewis demonstrated that allowing tags in the continuation lines results in some pretty surprising and hard to understand outcomes. I know that allowing tags to continuation lines was my addition, but if you don't allow them you also get some surprising constraints, such as the following:
This is an error because "umbrellas" has two values, an end-of-line string and a nested dictionary. Removing the : after weather converts this to a valid continued string. In summary, this new type of string offers a more natural syntax when compared to the existing multiline string. However, it is somewhat constrained and so cannot completely replace it. And its limitation and constraints are non-obvious and can result in surprising and difficult to understand behavior. The proposal gives me a YaML-vibe. My opinion is that the developers of YaML tried to allow a natural style for representing structured data, but this results in numerous ambiguities that they addressed by adding a seemingly endless number of forms and rules. The result is a language that even experts struggle with. I am trying very hard to avoid this with NestedText. |
Inability to express this is indeed unfortunate. I don't have a solution for this except suggesting making NestedText require a fixed indentation (specifically 2 spaces so that the continuation lines nicely align with the first line in a list item). That's a breaking change which I understand is extremely unlikely to happen. Or the 2-spaces indentation requirement could only relate to this new form of multiline strings, but that introduces in consistency to the language.
That's an inherent consequence of extending a syntax - the set of all feasible strings stays the same, but the set of strings valid in a language grows. Then, inevitably, that loses s level of error detection, because some previous errors are now valid.
No, that was my idea all along. I tried to explain multiple times that once you are in a multiline context, you disable all tag detection and interpret all lines verbatim (after removing indentation), until you return to a lower level of indentation. Not sure where the idea that I don't want to allow tags in the string value came from.
As a last effort in trying to save this proposal in getting lost and forgotten in the history of GitHub issues: Isn't this mitigated by only implementing the restricted version of my proposal, in which only strings where each continuation line starts with a non-space (and thus is non-empty) are allowed in this representation? The only problem I have with this is that currently single line strings starting with spaces are allowed in the shorthand "end-of-line" representation. This is actually something I dislike about current NestedText as it is inconsistent: in shorthand dictionary/list syntax leading and trailing whitespace is not significant but in shorthand text syntax, whitespace suddenly is significant. Why is that? Honestly this feels like it was allowed just because it is not syntactically ambiguous, rather than some conscious design choicee. It would be reasonable to disallow this (another breaking change though), so that the shorthand versions is allowed precisely for strings where each line (including the first) has 0 leading spaces. |
The restricted proposal basically limits this new form of multiline string to a single paragraph of text, and to avoid issues with colons, this new string probably needs to be able to contain tags. Presumably this is your intent, correct? I don't find that tremendously compelling. A whole new form of string dedicated to handling only one specific form of text: a single paragraph. It cannot handle code. It cannot handle multiple paragraphs. It cannot even handle indentation. And it is offered as an alternative to a generic form of string that does a good job of handling all strings without restriction. The benefit of this new form does not obviously outweigh the cost of providing a yet another type of string. As for your comment about the inconsistency between end-of-line strings and inline strings, I can assure you that both the rest-of-line strings and the multiline strings in NestedText were designed to be a general as possible to avoid the need for quoting and escaping. It was a primary design goal of the language. In both cases it was made possible by simply accepting all characters that follow the tag. This is not possible for inline strings where the strings are embedded in syntax. As a result, the inline strings more restricted than the other two forms. The inline forms are for convenience only and are completely optional. If you cannot live with the restrictions, you simply can avoid the inline forms. Given that inline strings are necessarily restricted anyway, and that the inline forms themselves are optional, it was decided to ignore both leading and trailing spaces on the keys and values to allow people to add extra spaces as they saw fit to make their code more readable. This last comments also demonstrates that your proposed tag-less multiline strings are not simply end-of-line strings that extend over multiple lines. End-of-line strings can contain any character other than a newline, whereas the tag-less multiline strings cannot contain leading spaces or empty lines. Thus, they are a fourth type of string. |
Yes.
I think you read my comment backwards. I understand the choice to ignore whitespace in inline strings. What I don't understand is why whitespace is significant in end-of-line strings. As I said, it feels like it was allowed simply because the syntax supports it. It would be perfectly fine to make it insignificant since (paraphrasing what you said): The end-of-line form is for convenience only and is completely optional. If you cannot live with the restrictions, you simply can avoid the end-of-line form (and use the
How so? The value
is a prefix of the value
and the representation of the first value as an item of a list
is a prefix of the representation of the second value as an item of a list
And there is no other way to represent the "abc" value as a tagless string. |
With inline strings, conventional style requires a space before the value. If one were trying to lines up values between multiple lines the style may result in leading and trailing spaces on both keys and value. These extra spaces are dictated by style and not a desire to represent spaces actually in the key or value. It is for this reason that leading and trailing spaces are ignored with inline strings. The situation is different with end-of-line strings. If the user adds leading or trailing spaces to a value we must assume they did so intentionally. There is no case where a particular style requires leading or trailing spaces.
Each of the 4 types of strings have different constraints.
Though I guess you can consider tag-less multiline strings as a redefinition of the end-of-line string
From that perspective there would only be three again, as you say. |
Yes, that's the way I think and speak about it from the very beginning. You even described it like that in your comment 2 days ago. |
Okay, let me summarize. You are proposing that we extend end-of-line strings to tag-less strings that have the following constraints:
This allows a single simple paragraph to be specified. In this case, simple implies that only the first line may be indented. I believe that this perspective gives the user a simple mental model of what is allowed and so would be considered easy to understand despite what otherwise might be considered non-obvious constraints. This proposal has the following potential issues:
I do not believe any of these issues should be considered fatal to the proposal. Probably the most significant criticism it is that it allows content that looks like valid NT structured data but is not due to a mistake and accepts it while interpreting it a way that completely differs from what may have been intended. Am I missing anything? |
The restriction to one paragraph seems a bit heavy. We could reduce impact of that restriction by introducing a new tag, say '+', that combines adjacent tag-less multiline strings into a single string. For example:
would be equivalent to:
But now that I have said it, doing so would make traditional mutliline strings redundant. Anything that could be expressed in a traditional multiline string could be expressed with tagless multiline strings and this new joining tag. |
I believe you provided a complete summary. |
Your counterproposal looks interesting, but I don't fully understand it.
should remain unaffacted. |
I can imagine using
means
It is used at the previous level of indentation to explicitly mark the line as a part of the the value. This solves representing multiple paragraphs, but it doesn't solve leading spaces. |
It is not a proposal at all, just an observation that I find interesting. You are correct, this idea does not extend the previous tagless string, it is a way combining an adjacent list of strings into a larger string. The current tagless string proposal only allows the specification of a single paragraph. One could specify multiple paragraphs by combining several tagless strings in a list, but then you get a list of paragraphs, not a block of text with multiple paragraphs. Hence the the idea of a new join tag that allows one to specify a list of paragraphs and have them combined into a single block of text. I am not suggesting that we add this new tag because while it can do everything the traditional multiline string can do, in generally it is more awkward. The only time it may be preferred is when one is entering long simple paragraphs and the editor does not support the automatic entry of the leading '> ' on each line. |
Ah, you should have said that right away, it didn't really make sense to me but now that you said how you got there, it makes perfect sense. But I agree that it is awkward. |
I don't think the goal should necessarily be to allow representation of as many strings as possible. There's always the general |
The dreaded YAML ... |
Yeah... we don't want that... that's why I think that if we think supporting whitespace is ambiguous to the users, the only way we can extend at all is to stop at the single paragraph strings without indented continuation lines. |
Let me take some time on this. I'll get back you within a couple of weeks at the latest. That should give plenty of time for others to comment if they feel the need. |
Alright, I'll work on the C# library (and then maybe Nim) in the meantime. Whatever the outcome, thank you both for the discussion, it's been very interesting! |
To be honest I'm a bit concerned that this would even be considered - it breaks some fundamental properties/design principles that led to my interest in nestedtext. |
Can you be specific? |
While not the person you're asking, the one that sticks out to me is being able to identify the type of any given line without referring to its context. |
I've been explicit and detailed about my concerns in #49 (comment) and subsequent comments, I don't really have anything else to say. |
Okay, thanks. Here is another issue that just occurred to me. If someone is editing an existing NT document and needs to add indentation or an empty line to a value specified as a tagless string, then they will need to convert the whole string to the traditional multiline string from. Thus, a small change would result in a disproportionate amount of effort. |
It could be solved by tooling, but yeah, you are right. |
Two years ago a created a nested format with strings, lists and dictionaries for my personal use that was surprisingly similar to NestedText. While it was a bit less elegant than NestedText and still required some escaping, there was one thing that was imo nicer than in NestedText - it supported unannounced multiline strings. In my understanding from quickly reading the docs, I don't think there would be any ambiguity in allowing
and
as shorter alternatives to
and
I'd love if this was added into the syntax if I'm not missing any clashes.
PS: I added NestedText to https://en.wikipedia.org/wiki/Comparison_of_data-serialization_formats .¨
EDIT: Well it was reverted. Apparently it needs a Wikipedia page first.
The text was updated successfully, but these errors were encountered: