parsing python like languages #79

schneidersoft · 2024-06-08T10:51:36Z

How would I parse python like languages where indentation is used to handle scope?

dolik-rce · 2024-07-04T20:07:23Z

the indentation is just another character for the parser, so there is not much different to other parsers. You can also use references to make sure the indentation on each line is correct (at least in simple cases, more complex code would probably need to check this in code).

Here is a very simple grammar to parse python-like function definition:

function <- "def " identifier "():\n" <indent> statement "\n" ($1 statement "\n")*

identifier <- [_a-zA-Z][_a-zA-Z0-9]*

indent <- " "+ / "\t"+

statement <- [^ \t\n][^\n]*

It would report syntax error if you run it on incorrectly indented code.

dolik-rce · 2024-07-04T20:10:19Z

PS: Here is a full python grammar (just in slightly different format than the one used by PackCC), if you need some inspiration: https://docs.python.org/3/reference/grammar.html

schneidersoft · 2024-07-07T10:31:25Z

Right. I was wondering how to integrate a tokenizer that would produce the INDENT end DEDENT tokens packcc would then be able to use.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parsing python like languages #79

parsing python like languages #79

schneidersoft commented Jun 8, 2024

dolik-rce commented Jul 4, 2024

dolik-rce commented Jul 4, 2024

schneidersoft commented Jul 7, 2024

parsing python like languages #79

parsing python like languages #79

Comments

schneidersoft commented Jun 8, 2024

dolik-rce commented Jul 4, 2024

dolik-rce commented Jul 4, 2024

schneidersoft commented Jul 7, 2024