-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix data race when using shared variables (free threading) #5494
Conversation
In the free threading build, there's a race between wrapper re-use and wrapper deallocation. This can happen with a static variable accessed by multiple threads. Fixing this requires using some private CPython APIs: _Py_TryIncref and _PyObject_SetMaybeWeakref. The implementations of these functions are included until they're made available as public ("unstable") APIs. Fixes pybind#5489
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix! My comments are all very minor.
include/pybind11/detail/class.h
Outdated
@@ -312,7 +312,31 @@ inline void traverse_offset_bases(void *valueptr, | |||
} | |||
} | |||
|
|||
#ifdef Py_GIL_DISABLED | |||
static inline void enable_try_inc_ref(PyObject *op) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ChatGPT:
The static in this context is redundant because the inline specifier already implies internal linkage for the function. Let's break it down:
static in C++ for functions: When applied to a function, static gives the function internal linkage, meaning it is only visible within the translation unit where it is defined.
inline in C++: Functions defined as inline have the following implications:
They can be defined in a header file and included in multiple translation units without violating the One Definition Rule (ODR).
They also implicitly have internal linkage unless explicitly declared with extern.
When inline is used, the function already has internal linkage by default. Adding static is not harmful, but it serves no additional purpose.
Suggested Best Practice
To avoid confusion and redundant code, it is generally better to omit static when inline is used, unless there's a specific stylistic or historical reason to keep it.
include/pybind11/detail/class.h
Outdated
static inline void enable_try_inc_ref(PyObject *op) { | ||
// TODO: Replace with PyUnstable_Object_EnableTryIncRef when available. | ||
// See https://github.com/python/cpython/issues/128844 | ||
if (_Py_IsImmortal(op)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to rename op
to obj
? (Or are there any special requirements for op
that you want to reflect with the variable name?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No special reason. op
is just commonly used as a variable name for Python objects in CPython so that's what I'm used to now.
include/pybind11/detail/class.h
Outdated
inline bool register_instance_impl(void *ptr, instance *self) { | ||
#ifdef Py_GIL_DISABLED | ||
enable_try_inc_ref((PyObject *) self); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please use
reinterpret_cast<PyObject *>(self)
?
(for readability; we still have a lot of raw C casts, but C++-style casts are preferred in new or changed code)
@@ -249,7 +292,10 @@ PYBIND11_NOINLINE handle find_registered_python_instance(void *src, | |||
for (auto it_i = it_instances.first; it_i != it_instances.second; ++it_i) { | |||
for (auto *instance_type : detail::all_type_info(Py_TYPE(it_i->second))) { | |||
if (instance_type && same_type(*instance_type->cpptype, *tinfo->cpptype)) { | |||
return handle((PyObject *) it_i->second).inc_ref(); | |||
PyObject *wrapper = (PyObject *) it_i->second; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think
auto *wrapper = reinterpret_cast<PyObject *>(it_i->second);
will make clang-tidy happy (but I haven't tried it out myself).
tests/test_thread.py
Outdated
def access_shared_instance(): | ||
b.wait() | ||
for _ in range(1000): | ||
x = m.EmptyStruct.SharedInstance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the explicit del
here needed? Could this be simplified to
for _ in range(1000):
m.EmptyStruct.SharedInstance
?
I asked ChatGPT and it seems to think the simpler code is equivalent. If that's not correct, could you please add a comment to explain (super terse would be fine)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's effectively the same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... clang-tidy ruff doesn't like the "useless expression":
tests/test_thread.py:60:13: B018 Found useless expression. Either assign it to a variable or remove it.
|
58 | b.wait()
59 | for _ in range(1000):
60 | m.EmptyStruct.SharedInstance
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ B018
61 |
62 | threads = [
|
Found 1 error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative is:
m.EmptyStruct.SharedInstance # noqa: B018
That would be my preference, but only very slightly so. With your comment it's also immediately obvious that there is nothing special about the del
.
Please let me know if you prefer to keep this as is. I'll merge this PR when I see that the CI is green.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll update it to use the # noqa
tests/test_thread.py
Outdated
def access_shared_instance(): | ||
b.wait() | ||
for _ in range(1000): | ||
x = m.EmptyStruct.SharedInstance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative is:
m.EmptyStruct.SharedInstance # noqa: B018
That would be my preference, but only very slightly so. With your comment it's also immediately obvious that there is nothing special about the del
.
Please let me know if you prefer to keep this as is. I'll merge this PR when I see that the CI is green.
(The pypy failure is a flake; we see this very often.) |
Description
In the free threading build, there's a race between wrapper re-use and wrapper deallocation. This can happen with a static variable accessed by multiple threads.
Fixing this requires using some private CPython APIs: _Py_TryIncref and _PyObject_SetMaybeWeakref. The implementations of these functions are included until they're made available as public ("unstable") APIs.
Fixes #5489
Suggested changelog entry: