Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make SDK safe for working with multiprocessing with the forkserver start method #3937

Open
n1ngu opened this issue Jan 15, 2025 · 3 comments
Labels
Component: SDK Core Dealing with the core of the SDK Discussion

Comments

@n1ngu
Copy link
Contributor

n1ngu commented Jan 15, 2025

How do you use Sentry?

Sentry Saas (sentry.io)

Version

2.19.2

Steps to Reproduce

On python 3.12 and a non-MacOS, non-Windows platform

import logging
import concurrent.futures
import sentry_sdk.integrations.logging

sentry_sdk.init(
    integrations=[
        sentry_sdk.integrations.logging.LoggingIntegration(event_level=logging.WARNING),
    ]
)
logging.captureWarnings(True)
logging.basicConfig(level=logging.DEBUG)
sentry_sdk.capture_message("hello")
pool = concurrent.futures.ProcessPoolExecutor()
pool.submit(sentry_sdk.capture_message, "world")

See the warning show up.

DeprecationWarning: This process is multi-threaded, use of fork() may lead to deadlocks in the child

Expected Result

  1. First and foremost, no fundamentally wrong use of os.fork() is introduced by Sentry SDK without the users knowing.
  2. If an application multiprocessing context uses the "forkserver" method, some safeguards are placed so that sentry won't start threads in the spawned fork server process
  3. No warning is triggered

Actual Result

Capturing warnings in sentry has recently surfaced this issue to me. I don't quite understand the subtleties of the fork method problems so please bear with me and refer to other sources for technical details

A conservative (safe but slow) workaround is to issue

multiprocessing.set_start_method("spawn")

in the application before any multiprocessing pool is started or use multiprocessing.get_context("spawn") for libraries used alongside sentry, or let such contexts be injected in your libraries in general.

AFAIU using the future "forkserver" default (on python 3.14) will not solve this problem because sentry is usually initialized very soon in the application configuration and I can't imagine a pool initialization procedure that will not initialize sentry in the fork server process in my applications. E.g. I am calling sentry_sdk.init on a sitecustomize.py module. I may understand this wrong, hopefully.

It could be argued that Sentry can't do anything about it because it is up to the application to opt-out of the default fork method, but maybe this would be worth being documented? I mean it would be reckless to configure a global multiprocessing.set_start_method from a library like sentry, and sentry does not directly use a process pool so it can neither take care to use a safe context internally, but sentry starts threads in the main process making any default use of process pools unsafe to applications that would be single-threaded otherwise.

Suffice to say, I have detected no actual deadlocks or other issues due to this (this has been a very concealed issue in posix platforms for years now)

related #291

@sl0thentr0py
Copy link
Member

thanks for the detailed issue @n1ngu

It could be argued that Sentry can't do anything about it because it is up to the application to opt-out of the default fork method, but maybe this would be worth being documented? I mean it would be reckless to configure a global multiprocessing.set_start_method from a library like sentry, and sentry does not directly use a process pool so it can neither take care to use a safe context internally, but sentry starts threads in the main process making any default use of process pools unsafe to applications that would be single-threaded otherwise.

I agree that Sentry cannot make assumptions about user applications and what they want to do. I will add this as documentation in our troubleshooting section so that users can get rid of the warning.

If an application multiprocessing context uses the "forkserver" method, some safeguards are placed so that sentry won't start threads in the spawned fork server process

Regarding this, I need to figure out best practice for what to do, so I will leave this issue open (with a rename).

@sl0thentr0py sl0thentr0py changed the title DeprecationWarning: This process is multi-threaded, use of fork() may lead to deadlocks in the child Make SDK safe for working with multiprocessing with the forkserver start method Jan 15, 2025
@sl0thentr0py sl0thentr0py added the Component: SDK Core Dealing with the core of the SDK label Jan 15, 2025
@sl0thentr0py
Copy link
Member

@n1ngu I'm actually not seeing the deprecation warning on my system

(.venv) [neel@vheissu vanilla]$ python --version
Python 3.12.2
(.venv) [neel@vheissu vanilla]$ uname -a
Linux vheissu 6.10.8-arch1-1 #1 SMP PREEMPT_DYNAMIC Wed, 04 Sep 2024 15:16:37 +0000 x86_64 GNU/Linux

are you using the exact same snippet you pasted above?

@n1ngu
Copy link
Contributor Author

n1ngu commented Jan 15, 2025

Hi @sl0thentr0py thanks for taking a look into it.

I believe the warning may be concealed by default, try with PYTHONDEVMODE=1 or PYTHONWARNINGS=once or something like that.

I tried to test it with that exact script but my dev environ is far from clean and may feature something else I am unaware of.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: SDK Core Dealing with the core of the SDK Discussion
Projects
Status: No status
Development

No branches or pull requests

2 participants