Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple lines f-string with non-ASCII breaks tokenize.generate_tokens in 3.12.4 #120343

Closed
leemars opened this issue Jun 11, 2024 · 6 comments
Closed
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@leemars
Copy link

leemars commented Jun 11, 2024

Bug report

Bug description:

import io
import tokenize

src = '''\
a = f"""
    Autorzy, którzy tą jednostkę mają wpisani jako AKTUALNA -- czyli"""
'''
tokens = list(tokenize.generate_tokens(io.StringIO(src).readline))

for token in tokens:
    print(token)

assert tokens[4].start == (2, 68), tokens[4].start

Python 3.12.3 (correct)

python --version
Python 3.12.3python c.py
TokenInfo(type=1 (NAME), string='a', start=(1, 0), end=(1, 1), line='a = f"""\n')
TokenInfo(type=55 (OP), string='=', start=(1, 2), end=(1, 3), line='a = f"""\n')
TokenInfo(type=61 (FSTRING_START), string='f"""', start=(1, 4), end=(1, 8), line='a = f"""\n')
TokenInfo(type=62 (FSTRING_MIDDLE), string='\n    Autorzy, którzy tą jednostkę mają wpisani jako AKTUALNA -- czyli', start=(1, 8), end=(2, 68), line='a = f"""\n    Autorzy, którzy tą jednostkę mają wpisani jako AKTUALNA -- czyli"""\n')
TokenInfo(type=63 (FSTRING_END), string='"""', start=(2, 68), end=(2, 71), line='    Autorzy, którzy tą jednostkę mają wpisani jako AKTUALNA -- czyli"""\n')
TokenInfo(type=4 (NEWLINE), string='\n', start=(2, 71), end=(2, 72), line='    Autorzy, którzy tą jednostkę mają wpisani jako AKTUALNA -- czyli"""\n')
TokenInfo(type=0 (ENDMARKER), string='', start=(3, 0), end=(3, 0), line='')

Python 3.12.4 (broken)

python --version
Python 3.12.4python c.py
TokenInfo(type=1 (NAME), string='a', start=(1, 0), end=(1, 1), line='a = f"""\n')
TokenInfo(type=55 (OP), string='=', start=(1, 2), end=(1, 3), line='a = f"""\n')
TokenInfo(type=61 (FSTRING_START), string='f"""', start=(1, 4), end=(1, 8), line='a = f"""\n')
TokenInfo(type=62 (FSTRING_MIDDLE), string='\n    Autorzy, którzy tą jednostkę mają wpisani jako AKTUALNA -- czyli', start=(1, 8), end=(2, 68), line='a = f"""\n    Autorzy, którzy tą jednostkę mają wpisani jako AKTUALNA -- czyli"""\n')
TokenInfo(type=63 (FSTRING_END), string='"""', start=(2, 72), end=(2, 75), line='    Autorzy, którzy tą jednostkę mają wpisani jako AKTUALNA -- czyli"""\n')
TokenInfo(type=4 (NEWLINE), string='\n', start=(2, 75), end=(2, 76), line='    Autorzy, którzy tą jednostkę mają wpisani jako AKTUALNA -- czyli"""\n')
TokenInfo(type=0 (ENDMARKER), string='', start=(3, 0), end=(3, 0), line='')
Traceback (most recent call last):
  File "/private/tmp/flake8/c.py", line 13, in <module>
    assert tokens[4].start == (2, 68), tokens[4].start
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: (2, 72)

More info

I found previous similar issue here: #112943

CPython versions tested on:

3.12

Operating systems tested on:

macOS

Linked PRs

@leemars leemars added the type-bug An unexpected behavior, bug, or error label Jun 11, 2024
@hugovk hugovk added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Jun 11, 2024
@pablogsal
Copy link
Member

Bisected to:

(memray) ❯ git bisect bad
4a0af0cfdcc0b81da5d78dc219df4985c4403f9c is the first bad commit
commit 4a0af0cfdcc0b81da5d78dc219df4985c4403f9c (HEAD)
Author: Miss Islington (bot) <[email protected]>
Date:   Tue May 28 22:49:02 2024 +0200

    [3.12] gh-119118: Fix performance regression in tokenize module (GH-119615) (#119683)

    - Cache line object to avoid creating a Unicode object
      for all of the tokens in the same line.
    - Speed up byte offset to column offset conversion by using the
      smallest buffer possible to measure the difference.

    (cherry picked from commit d87b0151062e36e67f9e42e1595fba5bf23a485c)

    Co-authored-by: Lysandros Nikolaou <[email protected]>
    Co-authored-by: Pablo Galindo <[email protected]>

 Misc/NEWS.d/next/Library/2024-05-28-12-15-03.gh-issue-119118.FMKz1F.rst |  2 ++
 Parser/pegen.c                                                          | 25 +++++++++++++++++++++++++
 Parser/pegen.h                                                          |  1 +
 Python/Python-tokenize.c                                                | 44 ++++++++++++++++++++++++++++++++++++++++----
 4 files changed, 68 insertions(+), 4 deletions(-)
 create mode 100644 Misc/NEWS.d/next/Library/2024-05-28-12-15-03.gh-issue-119118.FMKz1F.rst

@lysnikolaou do you have time to take a look?

@lysnikolaou
Copy link
Member

Yeah, this really seemed like it would be related to that commit. Will have a look in a tiny bit!

@pablogsal
Copy link
Member

It seems that is always the f-string end

lysnikolaou added a commit to lysnikolaou/cpython that referenced this issue Jun 11, 2024
lysnikolaou added a commit that referenced this issue Jun 11, 2024
…120352)

Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jun 11, 2024
…kens (pythonGH-120352)

(cherry picked from commit 1b62bce)

Co-authored-by: Lysandros Nikolaou <[email protected]>
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jun 11, 2024
…kens (pythonGH-120352)

(cherry picked from commit 1b62bce)

Co-authored-by: Lysandros Nikolaou <[email protected]>
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
lysnikolaou added a commit that referenced this issue Jun 11, 2024
…okens (GH-120352) (#120356)

(cherry picked from commit 1b62bce)

Co-authored-by: Lysandros Nikolaou <[email protected]>
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
lysnikolaou added a commit that referenced this issue Jun 11, 2024
…okens (GH-120352) (#120355)

(cherry picked from commit 1b62bce)

Co-authored-by: Lysandros Nikolaou <[email protected]>
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
@lysnikolaou
Copy link
Member

Fixed.

@lysnikolaou
Copy link
Member

I found one more issue after looking into #120377, so I'm reopening this. I'll push a fix shortly.

@lysnikolaou lysnikolaou reopened this Jun 12, 2024
lysnikolaou added a commit to lysnikolaou/cpython that referenced this issue Jun 12, 2024
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jun 12, 2024
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jun 12, 2024
lysnikolaou added a commit that referenced this issue Jun 12, 2024
lysnikolaou added a commit that referenced this issue Jun 12, 2024
@lysnikolaou
Copy link
Member

Closing again. Hopefully everything is resolved now. Feel free to reopen in case something weird shows up again.

mrahtz pushed a commit to mrahtz/cpython that referenced this issue Jun 30, 2024
…kens (python#120352)

Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
noahbkim pushed a commit to hudson-trading/cpython that referenced this issue Jul 11, 2024
…kens (python#120352)

Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
estyxx pushed a commit to estyxx/cpython that referenced this issue Jul 17, 2024
…kens (python#120352)

Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

5 participants