Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parsing python like languages #79

Open
schneidersoft opened this issue Jun 8, 2024 · 3 comments
Open

parsing python like languages #79

schneidersoft opened this issue Jun 8, 2024 · 3 comments

Comments

@schneidersoft
Copy link

How would I parse python like languages where indentation is used to handle scope?

@dolik-rce
Copy link
Contributor

Hello @schneidersoft,

the indentation is just another character for the parser, so there is not much different to other parsers. You can also use references to make sure the indentation on each line is correct (at least in simple cases, more complex code would probably need to check this in code).

Here is a very simple grammar to parse python-like function definition:

function <- "def " identifier "():\n" <indent> statement "\n" ($1 statement "\n")*

identifier <- [_a-zA-Z][_a-zA-Z0-9]*

indent <- " "+ / "\t"+

statement <- [^ \t\n][^\n]*

It would report syntax error if you run it on incorrectly indented code.

@dolik-rce
Copy link
Contributor

PS: Here is a full python grammar (just in slightly different format than the one used by PackCC), if you need some inspiration: https://docs.python.org/3/reference/grammar.html

@schneidersoft
Copy link
Author

Right. I was wondering how to integrate a tokenizer that would produce the INDENT end DEDENT tokens packcc would then be able to use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants