Rework TOML token parsing #100

lkirkwood · 2024-04-29T14:34:11Z

This PR closes #81 and also allows a number of other valid identifiers to be parsed by reworking the logic for parsing TOML tokens from characters.

For example, all of the identifiers in the following document are valid according to the TOML spec (ABNF here), but none are currently able to be parsed:

[foo.bar.baz]
1key = "myval"
-inf = 0
2024-04-30 = 100
½ = 0.5

(although even github markdown highlighting doesn't get that last one)

Now casting the current toml character to u32 for easier matching against the values provided in the abnf for toml. Removed the existing logic for parsing most characters to start again.

knickish

Generally looks pretty nice, thanks for the PR. If we can find a more readable way to handle the matching ( without trashing codegen) I would prefer that

knickish · 2024-04-29T23:27:25Z

src/toml.rs

+            #[allow(unreachable_patterns)]
+            match self.cur as u32 {
+                // ,
+                0x2C => {


I would really prefer if these stayed as chars instead of converting to u32 first. It might make the codegen a tiny bit worse, but I really doubt it.

Absolutely, will do.

Sorry but I'm a bit confused - by codegen are you referring to generating the match statement from the ABNF?

lkirkwood · 2024-04-30T02:36:26Z

Is this better? I would add comments with the characters next to each range in the macro, but many don't display in my system font. Can still do so if you feel it would help.

knickish · 2024-04-30T02:56:06Z

No I think that looks great. I'll give it a day or so and see if @not-fl3 has any input, if not will merge soon. Thanks for fixing this up!

lkirkwood · 2024-04-30T04:35:02Z

No problem, thanks for the great project!

not-fl3 · 2024-04-30T05:25:18Z

LGTM! Honestly I just don't have much of an opinion on TOML, feel free to merge!

knickish · 2024-04-30T13:03:52Z

Merging it is then. Thanks @lkirkwood

lkirkwood added 23 commits April 29, 2024 22:15

Add bare_key_chars macro to toml

e44d549

Added toml test for unquoted keys

5d37f89

Match toml chars as u32 + remove alnum match logic

56e4ad6

Now casting the current toml character to u32 for easier matching against the values provided in the abnf for toml. Removed the existing logic for parsing most characters to start again.

Restore original toml number parsing logic

a039298

Add allow for overlapping toml char pattern lint

84a7a05

Add toml parse_bare_key method

c3fabf8

Change language from bare_key to ident in toml

612ab3c

Updated toml key test case

e161428

Implemented toml parse_ident

41fa5f2

Add toml nan/inf todo and removed unused token

fe27c84

Impl into string for toml token

5ba789b

Reordered toml number parsing conditionals

cd4881c

Cleaned up some toml match statements

b18d925

Formatted toml token comments and rename macro

a9962c1

Moved toml number parsing into separate function

63e069b

Added guard for ident starting with num toml

ba568f9

Return err if num contains illegal chars toml

25cd040

Added toml ident_term_chars macro

f958591

Fix not pushing chars during inf/nan toml parse

181f74a

Start using ident_term_chars to detect inf/nan

16e37a2

Start using ident_term_chars to detect toml idents

1633470

Add close block char to toml ident_term_chars

4967fea

Updated toml key test

94174fa

lkirkwood marked this pull request as ready for review April 29, 2024 16:56

lkirkwood added 2 commits April 30, 2024 02:58

Replaced tomltok into string with from impl

4eb4037

Removed unnecessary return

73c9917

knickish reviewed Apr 29, 2024

View reviewed changes

lkirkwood added 2 commits April 30, 2024 12:13

Fixed toml no_std test

994dbad

Always match toml chars as char not u32

3a57d03

knickish merged commit 2f538fa into not-fl3:master Apr 30, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework TOML token parsing #100

Rework TOML token parsing #100

lkirkwood commented Apr 29, 2024

knickish left a comment

knickish Apr 29, 2024

lkirkwood Apr 30, 2024

lkirkwood commented Apr 30, 2024

knickish commented Apr 30, 2024

lkirkwood commented Apr 30, 2024

not-fl3 commented Apr 30, 2024

knickish commented Apr 30, 2024

Rework TOML token parsing #100

Rework TOML token parsing #100

Conversation

lkirkwood commented Apr 29, 2024

knickish left a comment

Choose a reason for hiding this comment

knickish Apr 29, 2024

Choose a reason for hiding this comment

lkirkwood Apr 30, 2024

Choose a reason for hiding this comment

lkirkwood commented Apr 30, 2024

knickish commented Apr 30, 2024

lkirkwood commented Apr 30, 2024

not-fl3 commented Apr 30, 2024

knickish commented Apr 30, 2024