Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python.gram: Incorrect column numbers in SyntaxError #47

Open
edemaine opened this issue Oct 3, 2021 · 4 comments
Open

python.gram: Incorrect column numbers in SyntaxError #47

edemaine opened this issue Oct 3, 2021 · 4 comments

Comments

@edemaine
Copy link
Contributor

edemaine commented Oct 3, 2021

Python's tokenize.TokenInfo structure has 0-indexed column numbers, but SyntaxError wants 1-indexed column numbers. See this discussion.

This code from python.gram is thus incorrect:

pegen/data/python.gram

Lines 306 to 309 in d130038

self._exception = SyntaxError(
message,
(self.filename, start[0], start[1], line)
)

On the other hand, this code from parser.py is correct:

return SyntaxError(message, (filename, tok.start[0], 1 + tok.start[1], tok.line))

Could be fixed as part of #41. @MatthieuDartiailh let me know what you prefer; also happy to help add 1+ in the two necessary spots.

@MatthieuDartiailh
Copy link
Collaborator

Thanks for the pointer @edemaine . I may not have time to work on #41 in the next couple of weeks (Python 3.10 final release means a bunch of CIs updates and releases for me) so feel free to make a PR against my branch.

edemaine added a commit to edemaine/pegen that referenced this issue Oct 4, 2021
SyntaxError expects 1-indexed column numbers, but tokens are 0-indexed.

Fixes we-like-parsers#47
@edemaine
Copy link
Contributor Author

Sorry for dropping the ball here. I tried adding tests against actual Python, and found lots of matches, but also many discrepancies. Here's an example on Python 3.10.0:

>>> {**c, a: *b}
  File "<stdin>", line 1
    {**c, a: *b}
             ^^
SyntaxError: cannot use a starred expression in a dictionary value

By contrast, pegen thinks the error involves one more character:

  File "<unknown>", line 1
    {**c, a: *b}
             ^^^
SyntaxError: cannot use a starred expression in a dictionary value

There's also discrepancies between Python versions. For example, here's a Python 3.9.7 error:

>>> {a: 1, c: 2, b}
  File "<stdin>", line 1
    {a: 1, c: 2, b}
                  ^
SyntaxError: invalid syntax

And here's the corresponding Python 3.10.0 error:

>>> {a: 1, c: 2, b}
  File "<stdin>", line 1
    {a: 1, c: 2, b}
                 ^
SyntaxError: ':' expected after dictionary key

And here's pegen's error:

  File "<unknown>", line 1
    {a: 1, b}
          ^^
SyntaxError: ':' expected after dictionary key

Overall, there are 68 discrepancies from Python 3.10 (and 72 from Python 3.9).

Presumably these are errors that should be fixed, but I'm not sure whether you want to merge in a broken test suite. I could use some guidance as to what's best. Perhaps I should add a pytest command-line option (and possibly mark these tests) and only run them when the command-line option is specified?

@MatthieuDartiailh
Copy link
Collaborator

Discrepancies between 3.9 and 3.10 are somewhat expected I think. @pablogsal could confirm it but I think several improvements have been made to error reporting in 3.10.
However discrepancies between 3.10 and pegen should be fixed. Column offsets in the AST should match the python AST (at least we test for it), but we may have some issues in error reporting. I will do my best to have a look later this week.

@pablogsal
Copy link
Contributor

Discrepancies between 3.9 and 3.10 are somewhat expected I think. @pablogsal could confirm it but I think several improvements have been made to error reporting in 3.10.

Yeah, In general, custom error messages and error positions may change between minor versions as some of them cannot be backported for stability reasons. In general, we should match the latest stable version against pegen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants