Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfaults in Python 3.11 on UWSGI worker close #153

Open
charlieevett opened this issue Sep 26, 2023 · 8 comments
Open

Segfaults in Python 3.11 on UWSGI worker close #153

charlieevett opened this issue Sep 26, 2023 · 8 comments

Comments

@charlieevett
Copy link

We are using Python Pulsar producer clients in a web app that runs in UWSGI. We have our settings set to terminate and relaunch workers after a certain number of requests. We recently upgrade to Python 3.11 and the newest Pulsar client (3.3.0). At that point we started to see segfaults as the worker process was exiting. Here is the traceback that gets logged:

!!! uWSGI process 85 got Segmentation Fault !!! *** backtrace of 85 *** uwsgi(uwsgi_backtrace+0x2f) [0x556ab933fcaf] uwsgi(uwsgi_segfault+0x23) [0x556ab9340083] /lib/x86_64-linux-gnu/libc.so.6(+0x3bfd0) [0x7f845c457fd0] /usr/local/lib/python3.11/site-packages/pulsar_client.libs/libpulsar-2dce0d9f.so(+0x2b387a) [0x7f845770087a] /usr/local/lib/python3.11/site-packages/_pulsar.cpython-311-x86_64-linux-gnu.so(+0xaf752) [0x7f8457f99752] /usr/local/lib/python3.11/site-packages/_pulsar.cpython-311-x86_64-linux-gnu.so(+0x5399e) [0x7f8457f3d99e] /usr/local/lib/python3.11/site-packages/_pulsar.cpython-311-x86_64-linux-gnu.so(+0x545ef) [0x7f8457f3e5ef] /usr/local/lib/libpython3.11.so.1.0(+0x228f8a) [0x7f845c825f8a] /usr/local/lib/libpython3.11.so.1.0(+0x1f4467) [0x7f845c7f1467] /usr/local/lib/libpython3.11.so.1.0(+0x1f4030) [0x7f845c7f1030] /usr/local/lib/libpython3.11.so.1.0(+0x29941f) [0x7f845c89641f] /usr/local/lib/libpython3.11.so.1.0(+0x2993c5) [0x7f845c8963c5] /usr/local/lib/libpython3.11.so.1.0(+0x1ed587) [0x7f845c7ea587] /usr/local/lib/libpython3.11.so.1.0(_PyModule_ClearDict+0x115) [0x7f845c82bfe5] /usr/local/lib/libpython3.11.so.1.0(+0x278bb6) [0x7f845c875bb6] /usr/local/lib/libpython3.11.so.1.0(Py_FinalizeEx+0x12b) [0x7f845c8616ab] uwsgi(uwsgi_plugins_atexit+0x71) [0x556ab933e121] /lib/x86_64-linux-gnu/libc.so.6(+0x3e4dd) [0x7f845c45a4dd] /lib/x86_64-linux-gnu/libc.so.6(+0x3e61a) [0x7f845c45a61a] uwsgi(+0x3df6f) [0x556ab92f5f6f] uwsgi(simple_goodbye_cruel_world+0x59) [0x556ab933f619] uwsgi(+0x87658) [0x556ab933f658] uwsgi(uwsgi_close_request+0x4d9) [0x556ab92f6bf9] uwsgi(simple_loop_run+0xc0) [0x556ab933bbc0] uwsgi(simple_loop+0x7b) [0x556ab933bc8b] uwsgi(uwsgi_ignition+0x279) [0x556ab93403b9] uwsgi(uwsgi_worker_run+0x25b) [0x556ab93449db] uwsgi(uwsgi_run+0x43a) [0x556ab9344f2a] uwsgi(+0x3aede) [0x556ab92f2ede] /lib/x86_64-linux-gnu/libc.so.6(+0x271ca) [0x7f845c4431ca] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85) [0x7f845c443285] uwsgi(_start+0x21) [0x556ab92f2f01] *** end of backtrace ***

The resulting core files are filling up the server disks. We could ulimit them but we'd prefer to avoid that.

@BewareMyPower
Copy link
Contributor

Is there a simple (hand-to-hand) way to reproduce? Could it be reproduced against 3.2.0?

@charlieevett
Copy link
Author

We did try 3.2.0 with the same results. We see the issue frequently, but I'm not sure how to reproduce on command, since UWSGI is initiating the process shutdown.

@charlieevett
Copy link
Author

It worked fine with Python 3.7 and the 3.0 client.

@gromsterus
Copy link

@charlieevett

  1. How do you run uwsgi? Could you share an example config?
  2. How do you initiate and close a pulsar client? Could you share a code example?

@charlieevett
Copy link
Author

charlieevett commented Sep 28, 2023

Sure, here is most of the uwsgi.ini file we use:

[uwsgi]
http = :$(PORT)
add-header = Connection: close
master = true
processes = $(PROCESSES)
die-on-term = true
memory-report = true
vacuum=True
log-x-forwarded-for=True
log-format = %(addr) - %(user) [%(ltime)] "%(method) %(uri) %(proto)" %(status) %(size) ("%(referer)" "%(uagent)" | %(msecs)
listen = 128
threads = true
max-requests = 750                  ; Restart workers after this many requests
max-worker-lifetime = 1800          ; Restart workers after this many seconds
reload-on-rss = 256                 ; Restart workers after this much resident memory
worker-reload-mercy = 60            ; How long to wait before forcefully killing workers

uwsgi is run using a very basic command in a Docker container:

$ uwsgi uwsgi.ini

The issue we're seeing is with the max-requests or max-worker-lifetime handling, when these limits are hit and uwsgi kills the worker process, it will often segfault during the exit. Relaunching process is fine, so the main issue is the file system filling up with core files.

@gromsterus
Copy link

I reproduced the bug.
It's similar to unbit/uwsgi#1969.

I see two possible fast solutions:

  1. Mv from dict state to object (code). Now segfaults gone:
    ...The work of process 32 is done. Seeya!
    worker 2 killed successfully (pid: 32)
    Respawned uWSGI worker 2 (new pid: 38)
    
  2. Mv from uwsgi to gunicorn (code, conf). Gunicorn is fast and simple server. In this case, it gives more control over the life cycle of workers with hooks.

@charlieevett
Copy link
Author

Thank you! We'll look into the workarounds, both seem plausible.

@ryanbagwell
Copy link

@gromsterus Many thanks for the investigation and suggested workarounds. Just wanted to mention the following in case it's helpful:

  • this appears to be an issue starting in Python 3.9
  • here's a piece from one of the uwsgi core dump files that might be helpful:
Thread 2 (Thread 0x7f479172e7c0 (LWP 55)):
#0  0x00007f4793a040c9 in _int_free (av=0x7f4793b40c60 <main_arena>, p=<optimized out>, have_lock=<optimized out>, have_lock@entry=0) at ./malloc/malloc.c:4698
#1  0x00007f4793a06d9f in __GI___libc_free (mem=<optimized out>) at ./malloc/malloc.c:3385
#2  0x00007f477dd7d869 in ?? () from /lib/x86_64-linux-gnu/libgdal.so.32
#3  0x00007f477dd816ab in std::map<char, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<char>, std::allocator<std::pair<char const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >::~map() () from /lib/x86_64-linux-gnu/libgdal.so.32
#4  0x00007f47939ac4dd in __run_exit_handlers (status=1, listp=0x7f4793b40820 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at ./stdlib/exit.c:116
#5  0x00007f47939ac61a in __GI_exit (status=<optimized out>) at ./stdlib/exit.c:146
#6  0x0000561a24866f6f in uwsgi_exit ()
#7  0x0000561a248b10b2 in uwsgi_segfault ()
#8  <signal handler called>
#9  0x0000561a289e5730 in ?? ()
#10 0x00007f478ed8c87d in pulsar::ProducerImpl::~ProducerImpl() () from /usr/local/lib/python3.11/site-packages/pulsar_client.libs/libpulsar-2dce0d9f.so
#11 0x00007f478f625752 in pybind11::class_<pulsar::Producer>::dealloc(pybind11::detail::value_and_holder&) () from /usr/local/lib/python3.11/site-packages/_pulsar.cpython-311-x86_64-linux-gnu.so
#12 0x00007f478f5c999e in pybind11::detail::clear_instance(_object*) () from /usr/local/lib/python3.11/site-packages/_pulsar.cpython-311-x86_64-linux-gnu.so
#13 0x00007f478f5ca5ef in pybind11_object_dealloc () from /usr/local/lib/python3.11/site-packages/_pulsar.cpython-311-x86_64-linux-gnu.so
#14 0x00007f4793d77f8a in _PyType_SUPPORTS_WEAKREFS (type=0x561a280e3bc0) at ./Include/internal/pycore_object.h:242
#15 PyObject_ClearWeakRefs (object=0x561a28244760) at Objects/weakrefobject.c:948
#16 0x00007f479400ad00 in _Py_TrueStruct () from /usr/local/lib/libpython3.11.so.1.0
#17 0x0000000000000000 in ?? ()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants