Iterative line parsing? #45

Ecconia · 2023-06-30T18:07:12Z

Ecconia
Jun 30, 2023

I have been looking a bit at the code, to figure out how escaping of the # character is handled.
The goal was to figure out if it is possible to encode the String \# with SUCC.
And with another goal to look into the effort of fixing/improving #44

I noticed that the coding style is wild 🔥
(Aka, no caching or similar - everything is calculated on the fly on demand).

In general and to fix the two mentioned issues (and the caching), I would propose iterative line parsing.
SUCC has a fine definition for how data lines shall look like.

In the parsing logic I would propose to parse every line fully first, with every aspect it might have and storing the results in classes as right now - but with caching them, so that there is no more iterating over the line required from then on.

The data to be extracted would be:
<Indentation> <Key> <SpaceBeforeColon> <SpaceAfterColon> <Value> <SpaceBeforeComment> <Comment>
Or if we stripe the reconstruction information:
<Indentation> <Key> <Value>
A line without a key is a list entry. A line without a value is just an opening key entry. A line with both is a key-value entry.

I would personally prefer this iterative line parsing approach.

<+> As it introduces caching the results.
<+> Allows more advanced (and proper) escaping rules.
<-> Multi-Line-Strings gonna be as annoying as before, I assume?
<-> Probably less easy expandable, as everything as to be written according to specification.

So my questions are:

Was this type of parsing considered?
Is there any downside to doing iterative line parsing?
Would this be something that could be applied onto this project?

Hope this helps, or initialized some thought process.

JimmyCushnie · 2023-07-09T04:24:55Z

JimmyCushnie
Jul 9, 2023
Maintainer

Thanks for starting our first discussion here on the official SUCC repo!

The goal was to figure out if it is possible to encode the String # with SUCC.

Yup, with \\#. Just updated the documentation on comment escaping to clarify this, thanks.

In general and to fix the two mentioned issues (and the caching), I would propose iterative line parsing.

This is a good idea, I support this and I want to do it. I will probably do so after adding benchmarks (#46) as I am interested in how big the performance boost will be!

Was this type of parsing considered?

Not at the time the code was initially written haha. I first wrote SUCC when I was a very novice programmer, and the original code was very bad, as I did not know what I was doing at all. Ever since, as I've gained experience and wisdom, I've been revising it and trying to fix the mistakes of the past. If I were to write SUCC from scratch today I'd probably use this kind of parsing, but this is just an element of SUCC that I haven't gotten around to fixing yet.

Is there any downside to doing iterative line parsing?

I suppose it adds a bit of complexity and unnecessary abstraction to the code, and introduces potential for bugs if the data stored in the line structs gets out of sync with what's actually written in the file. It also might slow down the saving of files; currently we just have to sequentially write each line, but with this system we'd have to generate new strings for each line based on the data in the structs. But I think it'd be very positive overall. (Performance regression could be eliminated with some additional code to not re-generate lines that aren't marked as dirty)

Would this be something that could be applied onto this project?

As discussed: yes!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Iterative line parsing? #45

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Iterative line parsing? #45

Ecconia Jun 30, 2023

Replies: 1 comment

JimmyCushnie Jul 9, 2023 Maintainer

Ecconia
Jun 30, 2023

JimmyCushnie
Jul 9, 2023
Maintainer