Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RecursionError in generated Python parser #245

Open
gvanrossum opened this issue Sep 2, 2020 · 3 comments
Open

RecursionError in generated Python parser #245

gvanrossum opened this issue Sep 2, 2020 · 3 comments

Comments

@gvanrossum
Copy link
Collaborator

The test corpus contained a case where function is called with many (over 255) arguments, each just numbers. This used to be a syntax error in 3.6 and before -- but with more recent versions of the bytecode compiler it works for 10s of 1000s of arguments.

However, the generated parser in Python cannot handle it! It raises a RecursionError for A(0, 1, ..., k) if k is 221 or higher. The traceback seems to be an endless repetition of this fragment:

  File "/Users/guido/pegen/data/python_parser.py", line 5512, in _tmp_108
    if (literal := self.expect(",")) and (c := self.args()):
  File "/Users/guido/pegen/pegen/parser.py", line 65, in memoize_wrapper
    tree = method(self, *args)
  File "/Users/guido/pegen/data/python_parser.py", line 3041, in args
    if (a := self.named_expression()) and (b := self._tmp_108(),):
  File "/Users/guido/pegen/pegen/parser.py", line 65, in memoize_wrapper
    tree = method(self, *args)

I think this is actually about this rule:

args[expr_ty]:
    [...]
    | a=named_expression b=[',' c=args { c }] {
        [...] }

The part [',' c=args {c}] corresponds to _tmp_108. This is a straightforward recursion (not left-).

I can't actually fathom that the C parser doesn't segfault on this for me; it just slows down. (In the C code, it's _tmp_109, but the structure is the same.) The stack must automatically grow (on Mac, at least). But even there it would be nice if we could replace this by a simple loop.

E.g. maybe we could make this work?

args[expr_ty]:
    | a=starred_expression b=[',' c=args { c }] {
        _Py_Call(_PyPegen_dummy_name(p),
                 (b) ? CHECK(_PyPegen_seq_insert_in_front(p, a, ((expr_ty) b)->v.Call.args))
                     : CHECK(_PyPegen_singleton_seq(p, a)),
                 (b) ? ((expr_ty) b)->v.Call.keywords : NULL,
                 EXTRA) }
    | a=kwargs { _Py_Call(_PyPegen_dummy_name(p),
                          CHECK_NULL_ALLOWED(_PyPegen_seq_extract_starred_exprs(p, a)),
                          CHECK_NULL_ALLOWED(_PyPegen_seq_delete_starred_exprs(p, a)),
                          EXTRA) }
    | ','.(named_expression !'=')+ { XXX }

I haven't figured out what should go into XXX yet, but it's probably going to be more efficient, since we don't build up the array of arguments by prepending one at a time.

@gvanrossum
Copy link
Collaborator Author

(Oh, I tried, and that doesn't actually work even for the Python parser. But why? @lysnikolaou any idea?)

@gvanrossum
Copy link
Collaborator Author

I tried this and it works... Now for filling in XXX...

args[expr_ty]:
    | ','.(starred_expression | named_expression !'=')+ [',' kwargs] { XXX }
    | a=kwargs { _Py_Call(_PyPegen_dummy_name(p),
                          CHECK_NULL_ALLOWED(_PyPegen_seq_extract_starred_exprs(p, a)),
                          CHECK_NULL_ALLOWED(_PyPegen_seq_delete_starred_exprs(p, a)),
                          EXTRA) }

@pablogsal
Copy link
Collaborator

pablogsal commented Sep 2, 2020

Here is a draft implementation:

python/cpython#22053

For this script:

import ast

code = "f(" + ",".join(['a' for _ in range(100000)]) + ")"
print("Ready!")
ast.parse(code)

I get these results (release mode):

This patch:  0.17s +- 0.04

Python3.8:  0.27s +- 0.08

Master:
[1]    78689 segmentation fault (core dumped)  ./python lel.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants