Is there a way to avoid token/ID conflicts? #1032

rpedrosanto · 2023-04-21T20:24:06Z

rpedrosanto
Apr 21, 2023

Take the following grammar as an example:

grammar Sample

entry Model:
    (types+=Type | fields+=Field)*;

Type:
    'type' name=ID;

Field:
    'text' text=STRING 'as' type=[Type:ID];

hidden terminal WS: /\s+/;
terminal ID: /[_a-zA-Z][\w_]*/;
terminal STRING: /"[^"]*"/;

It works fine for the following:

type boolean
type number

text "Field A" as boolean
text "Field B" as number

But if I define a new type for which the ID conflicts with any of the tokens in the language, e.g.:

type text

text "Field C" as text

I get the error Expecting token of type 'ID' but found 'text'.

Is this supposed to work? Doesn't it know from context that text after type and as should be an ID?
Is there a way to fix this without changing the language? I could wrap the IDs in single quotes for instance but I'd really like to avoid it.

Thanks in advance. I appreciate it.

msujew · 2023-04-21T21:34:13Z

msujew
Apr 21, 2023
Maintainer

Hi @rpedrosanto,

Langium's parser (chevrotain) is using a typical two-phase approach to parsing: The text is first tokenized by a lexer and then these tokens are parsed the actual parser. This has a few benefits, mostly performance related. On the other hand, it has the disadvantage of not being able to easily decide between tokens using context. Generally, lexers work without any context. This is the issue you're encountering.

Fortunately, it can be relatively easily addressed. You can declare a data type rule that includes these keywords:

grammar Sample

entry Model:
    (types+=Type | fields+=Field)*;

Type:
    'type' name=Identifier;

Field:
    'text' text=STRING 'as' type=[Type:Identifier];

Identifier returns string:
  'text' | 'type' | ID;

hidden terminal WS: /\s+/;
terminal ID: /[_a-zA-Z][\w_]*/;
terminal STRING: /"[^"]*"/;

The Identifier parser rule is context dependent, unlike the normal terminals. This allows to circumvent the usual issues when dealing with this sort of two-phase parsing.

1 reply

rpedrosanto Apr 24, 2023
Author

This works great, thanks @msujew !

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there a way to avoid token/ID conflicts? #1032

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Is there a way to avoid token/ID conflicts? #1032

rpedrosanto Apr 21, 2023

Replies: 1 comment · 1 reply

msujew Apr 21, 2023 Maintainer

rpedrosanto Apr 24, 2023 Author

rpedrosanto
Apr 21, 2023

Replies: 1 comment 1 reply

msujew
Apr 21, 2023
Maintainer

rpedrosanto Apr 24, 2023
Author