Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warn when a process that has imported tiledb calls fork() #1876

Merged
merged 9 commits into from
Dec 22, 2023

Conversation

sgillies
Copy link
Contributor

@sgillies sgillies commented Dec 1, 2023

TileDB uses multiple threads and it's easy to run into deadlocks on Linux where fork is the default for multiprocessing until Python 3.14. We want to see a warning like Python 3.12 emits when fork() is called with multiple threads detected (see the implementation in posixmodule.c). I see three different ways to do this.

  1. Do nothing other than document the situation and wait for users to migrate to Python 3.12.
  2. Warn about fork() immediately on import of tiledb.
  3. Warn when os.fork() is called and tiledb has been imported.

The pros for number one: only needing to reference the change coming in 3.12. The cons: this project currently supports 4 versions of Python before 3.12 and it'll be years before new users are entirely on 3.12. Also, to a first approximation, nobody reads documentation. Perhaps better said as: we often try to read as little documentation as we can get away with.

The pros for number two: super easy to implement and all tiledb users see it. The cons: users would see the warning even if they had no intention of using fork(). It's noisy and would have to be actively silenced.

The pros for number three: all tiledb users who call os.fork(), intentionally or not, get the warning and no one else. It's not noisy. The cons: it requires monkeypatchingos.fork() or, suboptimally, using os.register_at_fork().

The only problem with os.register_at_fork() is that exceptions and warnings can't be raised from registered callables. We're limited to printing to stderr/stdout. This is a less good experience for developers and the project also currently checks in tests that tiledb doesn't print such output.

Monkeypatching methods of the standard library isn't a perfect solution. We'd all prefer that modules we import incidentally didn't change things in the standard library, right? And there's no guarantee that a module imported after tiledb won't patch os.fork() again.

On the other hand, users import tiledb deliberately, to use it. And it's mostly incompatible with fork(). The monkeypatch I'm suggesting for os.fork() doesn't change behavior of the stdlib function, it only wraps it in a warning. It's as innocuous as such a patch can be.

We want to get a DeprecationWarning like Python 3.12 emits when
fork() is called with multiple threads detected.
Copy link

This is equivalent to the warnings added in Python 3.12. We don't
try to count threads because TileDB is multithreaded.
"To safely use TileDB with multiprocessing or "
"concurrent.futures, choose 'spawn' as the start "
"method for child processes.",
UserWarning,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's my first take on what a user should see printed in their terminal or notebook if they unsafely combine tiledb and fork().

@sgillies sgillies requested a review from ihnorton December 5, 2023 19:29
@sgillies
Copy link
Contributor Author

sgillies commented Dec 5, 2023

@ihnorton can you do a preliminary review of this PR?

Copy link
Member

@ihnorton ihnorton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks for the nice detailed message. Two minor suggestions inline.

Also, it might be worth running the test in a separate process for full control over the import order (could test import tiledb both before and after some other user of os.fork?). In another import-related test here, we spawn a subprocess to validate the postcondition in a separate interpreter environment.

@sgillies
Copy link
Contributor Author

sgillies commented Dec 6, 2023

I overlooked a 4th approach: patching os.fork() not on import of tiledb, but on initialization of a Ctx. It keeps the solution closer to the problem and is much easier to test in isolation. A fixture like 8cfe60d#diff-f31ed6f6e26b774c62db3c48ad14bc06543b7d6fc81d6e3890ea4f7722261a2fR104-R111 is enough to ensure that all tests run with the os module starting in its original pristine state.

@sgillies sgillies marked this pull request as ready for review December 12, 2023 18:48
@sgillies
Copy link
Contributor Author

@ihnorton I've finally got the wrap/patch and fixture to isolate it into a readable and maintainable state.

Copy link
Member

@ihnorton ihnorton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM! Thanks for building this, and especially for the detailed comments.

@ihnorton ihnorton merged commit 0abb5df into dev Dec 22, 2023
15 checks passed
@ihnorton ihnorton deleted the sg/sc-25113/warn-at-fork branch December 22, 2023 03:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants