Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scrape ps output to try and kill any grandchild processes when stopping a process #1502

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

michaelhammann
Copy link

@michaelhammann michaelhammann commented Apr 19, 2022

We also had problems with orphaned grandchild processes and that processes are still running even after stopping supervisor child processes. Here is a possible solution I want to share and I am happy for any feedback.

Use Case/ Context
Stopping a process via CLI or supervisor UI should stop the process and all it's child processes reliable. If the processes refuse to stop (e.g. not reacting to signals), they need to be forcefully stopped after some timeout. Additionally only direct child's of supervisor are properly stopped. Grandchild processes are not properly stopped and net be signaled as well.

Current Behavior
Supervisor handles different signals as stated here: http://supervisord.org/running.html#signals When a TERM, INT or QUIT signal is received, supervisor will trigger the shutdown behavior. This signals can also be sent to the process group, but not all processes started from a program are necessarily in the same process group, as mentioned here: #199 (comment). Supervisor only knows about the direct child processes that it started but not any processes which any of the child processes spawned.

Possible Solution:
This stopping behavior still needs some improvements.
Stopping a process and all it's child processes reliable also needs a check if the processes are actually stopped and if the processes refuse to stop (e.g. not reacting to signals), they need to be forcefully stopped after some timeout via SIGKILL signal. There are a couple of things to consider:

think about a good timeout. Systemd uses about 2-3 mins, for my use-cases 60 seconds should be sufficient.
SIGKILL can be very intrusive. For example when processes are killed which are doing some operations on a database, this can cause loss of data.
There should be an additional config parameter such as disable_force_shutdown_behavior which would disable the mechanism of sending SIGKILL to the process and its childs.
We should define our usual behavior and allow users to alter this behavior. What's sensible is a default behavior -- I would say a round of SIGTERM to children, and after some time sending a SIGKILL to those who still alive is the ideal solution.

Note, however, that sometimes process can "hang" in the kernel (or as they call it D-state, see ps man page). It's rare in a healthy system, but still possible. For those situation, I would say we should report it to the user somehow.

Related to #1101

@michaelhammann michaelhammann changed the title Feature improve shutdown behavior. Closes https://github.com/Supervisor/supervisor/issues/1101 Feature improve shutdown behavior. Apr 19, 2022
@mnaberez mnaberez changed the title Feature improve shutdown behavior. Scrape ps output to try and kill any grandchild processes when stopping a process Apr 19, 2022
this is needed when trying to kill child processes which run in a docker container
@mrakh
Copy link

mrakh commented Jul 19, 2023

The ideal way to do this would be to designate supervisord as a "subreaper" so that it can inherit any orphaned descendant processes, then kill them. This functionality is supported by both Linux (with prctl(PR_SET_CHILD_SUBREAPER, 1, 0, 0, 0)) and FreeBSD (with procctl(P_PID, 0, PROC_REAP_ACQUIRE, NULL)). You should be able to call those functions using ctypes.

Unfortunately, I can't find any kind of subreaper functionality for Solaris. But rather than executing ps and scraping its output, a more proper approach would be to walk the procfs to get all the descendant processes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants