Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix data race when using shared variables (free threading) #5494

Merged
merged 9 commits into from
Jan 16, 2025

Conversation

colesbury
Copy link
Contributor

Description

In the free threading build, there's a race between wrapper re-use and wrapper deallocation. This can happen with a static variable accessed by multiple threads.

Fixing this requires using some private CPython APIs: _Py_TryIncref and _PyObject_SetMaybeWeakref. The implementations of these functions are included until they're made available as public ("unstable") APIs.

Fixes #5489

Suggested changelog entry:

Fix data race in free threaded CPython when accessing a shared static variable.

In the free threading build, there's a race between wrapper re-use and
wrapper deallocation. This can happen with a static variable accessed by
multiple threads.

Fixing this requires using some private CPython APIs: _Py_TryIncref and
_PyObject_SetMaybeWeakref. The implementations of these functions are
included until they're made available as public ("unstable") APIs.

Fixes pybind#5489
@colesbury
Copy link
Contributor Author

cc @rostan-t @rwgk

Copy link
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix! My comments are all very minor.

@@ -312,7 +312,31 @@ inline void traverse_offset_bases(void *valueptr,
}
}

#ifdef Py_GIL_DISABLED
static inline void enable_try_inc_ref(PyObject *op) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ChatGPT:

The static in this context is redundant because the inline specifier already implies internal linkage for the function. Let's break it down:

static in C++ for functions: When applied to a function, static gives the function internal linkage, meaning it is only visible within the translation unit where it is defined.

inline in C++: Functions defined as inline have the following implications:

They can be defined in a header file and included in multiple translation units without violating the One Definition Rule (ODR).
They also implicitly have internal linkage unless explicitly declared with extern.
When inline is used, the function already has internal linkage by default. Adding static is not harmful, but it serves no additional purpose.

Suggested Best Practice
To avoid confusion and redundant code, it is generally better to omit static when inline is used, unless there's a specific stylistic or historical reason to keep it.

static inline void enable_try_inc_ref(PyObject *op) {
// TODO: Replace with PyUnstable_Object_EnableTryIncRef when available.
// See https://github.com/python/cpython/issues/128844
if (_Py_IsImmortal(op)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to rename op to obj? (Or are there any special requirements for op that you want to reflect with the variable name?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No special reason. op is just commonly used as a variable name for Python objects in CPython so that's what I'm used to now.

inline bool register_instance_impl(void *ptr, instance *self) {
#ifdef Py_GIL_DISABLED
enable_try_inc_ref((PyObject *) self);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please use

reinterpret_cast<PyObject *>(self)

?
(for readability; we still have a lot of raw C casts, but C++-style casts are preferred in new or changed code)

@@ -249,7 +292,10 @@ PYBIND11_NOINLINE handle find_registered_python_instance(void *src,
for (auto it_i = it_instances.first; it_i != it_instances.second; ++it_i) {
for (auto *instance_type : detail::all_type_info(Py_TYPE(it_i->second))) {
if (instance_type && same_type(*instance_type->cpptype, *tinfo->cpptype)) {
return handle((PyObject *) it_i->second).inc_ref();
PyObject *wrapper = (PyObject *) it_i->second;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think

auto *wrapper = reinterpret_cast<PyObject *>(it_i->second);

will make clang-tidy happy (but I haven't tried it out myself).

def access_shared_instance():
b.wait()
for _ in range(1000):
x = m.EmptyStruct.SharedInstance
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the explicit del here needed? Could this be simplified to

        for _ in range(1000):
            m.EmptyStruct.SharedInstance

?

I asked ChatGPT and it seems to think the simpler code is equivalent. If that's not correct, could you please add a comment to explain (super terse would be fine)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's effectively the same

Copy link
Contributor Author

@colesbury colesbury Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... clang-tidy ruff doesn't like the "useless expression":


tests/test_thread.py:60:13: B018 Found useless expression. Either assign it to a variable or remove it.
   |
58 |         b.wait()
59 |         for _ in range(1000):
60 |             m.EmptyStruct.SharedInstance
   |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ B018
61 | 
62 |     threads = [
   |

Found 1 error.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative is:

            m.EmptyStruct.SharedInstance  # noqa: B018

That would be my preference, but only very slightly so. With your comment it's also immediately obvious that there is nothing special about the del.

Please let me know if you prefer to keep this as is. I'll merge this PR when I see that the CI is green.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update it to use the # noqa

def access_shared_instance():
b.wait()
for _ in range(1000):
x = m.EmptyStruct.SharedInstance
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative is:

            m.EmptyStruct.SharedInstance  # noqa: B018

That would be my preference, but only very slightly so. With your comment it's also immediately obvious that there is nothing special about the del.

Please let me know if you prefer to keep this as is. I'll merge this PR when I see that the CI is green.

@rwgk
Copy link
Collaborator

rwgk commented Jan 16, 2025

(The pypy failure is a flake; we see this very often.)

@rwgk rwgk merged commit 15d9dae into pybind:master Jan 16, 2025
76 checks passed
@github-actions github-actions bot added the needs changelog Possibly needs a changelog entry label Jan 16, 2025
@colesbury colesbury deleted the issue5489-shared-instance branch January 16, 2025 23:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs changelog Possibly needs a changelog entry
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG]: Data race when using static variables with free-threaded Python
2 participants