Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timing out when a cancelled process takes too long to die #757

Open
1 task done
jwodder opened this issue Jul 10, 2024 · 2 comments
Open
1 task done

Timing out when a cancelled process takes too long to die #757

jwodder opened this issue Jul 10, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@jwodder
Copy link
Contributor

jwodder commented Jul 10, 2024

Things to check first

  • I have searched the existing issues and didn't find my feature already requested there

Feature description

Currently, when a running Process is cancelled, anyio simply does:

self.kill()
with CancelScope(shield=True):
    await self.wait()

However, in pathological cases, the killed process may take arbitrarily long to actually exit, resulting in the program hanging indefinitely. I therefore request the ability to specify a timeout for the "wait" above; if the process doesn't exit in time, a dedicated error is raised so that the program can continue cleaning up and the programmer can know what went wrong.

Use case

We recently ran into a situation where some child processes got stuck in "uninterruptible sleep" (as reported by ps). As a result, when the timeouts we had wrapped them in expired, our program ended up hanging waiting for the subprocesses to acknowledge their deaths. We would prefer it if our program were to exit with an informative error message when this happened rather than just stalling forever.

@agronholm
Copy link
Owner

If we don't reap the child process, it becomes a zombie. Trio also waits indefinitely: https://github.com/python-trio/trio/blob/main/src/trio/_subprocess.py#L754-L764

@yarikoptic
Copy link

Depending on the setup/system - zombies are "nothing new" and usually picked up by the root process. That often mandates for a container running such services to have such root process. For that there is even a dedicated option within docker run:

❯ docker run --help | grep -A1 -e --init
      --init                           Run an init inside the container that forwards signals and reaps
                                       processes

NB more on zombies and containers at https://stackoverflow.com/questions/49162358/docker-init-zombies-why-does-it-matter and there in.

But without such functionality, anyio cannot be used in scenarios where processes cannot be gracefully killed for some reason (e.g., filesystem might stall in our case), and process would keep running indefinitely only until some other actor/operator detects and reacts to that stalling situation. So in our particular case we do not want really to abandon/breed zombies, we want to react and exit with error upon creating one.

Altogether I do feel that default behavior might be desired to remain waiting, but I would appreciate if at least it would be optionally allowed to perform more aggressive killing and eventually abandonment of underlying process whenever setup does require to avoid overall stalling of the application. May be that cited codeblock could be made "pluggable" so applications have flexibility to alter handling of process interruption to their liking, and then a few default handlers provided for common situations?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants