-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-128045: Mark unknown opcodes as deopting to themselves #128044
base: main
Are you sure you want to change the base?
Conversation
for name, deopt in sorted(deopts): | ||
out.emit(f"[{name}] = {deopt},\n") | ||
defined = set(analysis.opmap.values()) | ||
for i in range(256): | ||
if i not in defined: | ||
out.emit(f"[{i}] = {i},\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're not testing this at all in cpython, I'd suggest we at least add a couple of assertions to make sure this is correctly covering the range of opcodes:
for name, deopt in sorted(deopts): | |
out.emit(f"[{name}] = {deopt},\n") | |
defined = set(analysis.opmap.values()) | |
for i in range(256): | |
if i not in defined: | |
out.emit(f"[{i}] = {i},\n") | |
defined = set(analysis.opmap.values()) | |
for i in range(256): | |
if i not in defined: | |
deopts.append((f'{i}', f'{i}')) | |
assert len(deopts) == 256 | |
assert len(set(x[0] for x in deopts)) == 256 | |
for name, deopt in sorted(deopts): | |
out.emit(f"[{name}] = {deopt},\n") |
Summary: When CPython hands out the bytecode it will first do a de-opt on it: https://www.internalfb.com/code/fbsource/[241e0d0760d4c107a3c0ac2b4914524120b0c909]/third-party/python/3.12/Objects/codeobject.c?lines=1540&reveal=1540 This uses the de-opt table which only defines the known opcodes, meaning unknown opcodes get turned into 0's. We need CPython to at least define unknown opcodes to at least de-opt to themselves. Upstream PR: python/cpython#128044 Reviewed By: jbower-fb Differential Revision: D67350914 fbshipit-source-id: 0073efab52da1be775272e7dd9ae5a46468ccb10
I think we regard undefined instructions an error. For example, one thing I have considered is storing bytecodes in a compact format on disk: Instructions without an oparg would take 1 byte, those with an oparg would take 2. We would then combine the unmarshalling and quickening steps to create the full in-memory form in a single pass. Custom instructions would not survive this process. |
When accessing
co_code
on a code object we'll first run through to do the de-opt https://github.com/python/cpython/blob/main/Objects/codeobject.c#L1706This removes any unknown opcodes. Instead the deopt table should just recognize unknown opcodes as deopting to themselves, allowing the extensible interpreter loop to consume unknown opcodes.