-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Warn when a process that has imported tiledb calls fork() #1876
Conversation
We want to get a DeprecationWarning like Python 3.12 emits when fork() is called with multiple threads detected.
This pull request has been linked to Shortcut Story #25113: Warning on import of multiprocess and tiledb in the same process on linux. |
This is equivalent to the warnings added in Python 3.12. We don't try to count threads because TileDB is multithreaded.
tiledb/tests/test_multiprocessing.py
Outdated
"To safely use TileDB with multiprocessing or " | ||
"concurrent.futures, choose 'spawn' as the start " | ||
"method for child processes.", | ||
UserWarning, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's my first take on what a user should see printed in their terminal or notebook if they unsafely combine tiledb and fork().
@ihnorton can you do a preliminary review of this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks for the nice detailed message. Two minor suggestions inline.
Also, it might be worth running the test in a separate process for full control over the import order (could test import tiledb
both before and after some other user of os.fork
?). In another import-related test here, we spawn a subprocess to validate the postcondition in a separate interpreter environment.
I overlooked a 4th approach: patching |
Drop use of monkeypatch fixture for os.fork isolation. Add some docs and comments to the two new fixtures and rename the test module.
@ihnorton I've finally got the wrap/patch and fixture to isolate it into a readable and maintainable state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM! Thanks for building this, and especially for the detailed comments.
TileDB uses multiple threads and it's easy to run into deadlocks on Linux where
fork
is the default formultiprocessing
until Python 3.14. We want to see a warning like Python 3.12 emits when fork() is called with multiple threads detected (see the implementation in posixmodule.c). I see three different ways to do this.os.fork()
is called and tiledb has been imported.The pros for number one: only needing to reference the change coming in 3.12. The cons: this project currently supports 4 versions of Python before 3.12 and it'll be years before new users are entirely on 3.12. Also, to a first approximation, nobody reads documentation. Perhaps better said as: we often try to read as little documentation as we can get away with.
The pros for number two: super easy to implement and all tiledb users see it. The cons: users would see the warning even if they had no intention of using
fork()
. It's noisy and would have to be actively silenced.The pros for number three: all tiledb users who call
os.fork()
, intentionally or not, get the warning and no one else. It's not noisy. The cons: it requires monkeypatchingos.fork()
or, suboptimally, usingos.register_at_fork()
.The only problem with
os.register_at_fork()
is that exceptions and warnings can't be raised from registered callables. We're limited to printing to stderr/stdout. This is a less good experience for developers and the project also currently checks in tests that tiledb doesn't print such output.Monkeypatching methods of the standard library isn't a perfect solution. We'd all prefer that modules we import incidentally didn't change things in the standard library, right? And there's no guarantee that a module imported after tiledb won't patch
os.fork()
again.On the other hand, users import tiledb deliberately, to use it. And it's mostly incompatible with
fork()
. The monkeypatch I'm suggesting foros.fork()
doesn't change behavior of the stdlib function, it only wraps it in a warning. It's as innocuous as such a patch can be.