Skip to content

feat(cram): cleanup subprocesses on exit #11827

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

Alizter
Copy link
Collaborator

@Alizter Alizter commented May 21, 2025

This PR adds a reproduction case for #11820 where a subprocess is started in a cram test but becomes orphaned once the cram test terminates.

We then introduce the following fix. We use a more sophisticated trap at the start of the underlying shell script in order to terminate all processes in the process group. In order to avoid the main shell being terminated, we use a second trap to exit gracefully.

Fix #11820

This also allows us to re-enable some previously flakey tests in the CI.

A future improvement would be to run cram tests entirely as dune actions themselves, that way we can delegate process handling to the engine and have better cross platform support. That will require a lot more work however, so in the meantime we offer this improvement.

@Alizter Alizter force-pushed the cram-terminate-children branch from 4cabe9e to 9573983 Compare May 21, 2025 08:52
@Alizter Alizter changed the title test: crams not terminating subprocesses feat(cram): cleanup subprocesses on exit May 21, 2025
@Alizter Alizter marked this pull request as draft May 21, 2025 09:13
@Alizter Alizter force-pushed the cram-terminate-children branch from 9573983 to eff4020 Compare May 21, 2025 09:18
@Alizter Alizter force-pushed the cram-terminate-children branch 2 times, most recently from adbbb35 to 49cc1b4 Compare May 21, 2025 10:53
@Alizter Alizter marked this pull request as ready for review May 21, 2025 10:53
This should fix a common problem where users run subprocesses, for
example using `./script.sh &` inside their cram tests. If these
subprocesses do not terminate by themselves, they become orphaned and
have to be manually killed. A common use case is testing a web-server.

We add a more sophisticated trap for the underlying shell script of a
cram test so that it can successfully terminate all of its children.

Signed-off-by: Ali Caglayan <[email protected]>
@Alizter Alizter force-pushed the cram-terminate-children branch 3 times, most recently from 6595f3f to 5889ad5 Compare May 21, 2025 12:06
@Alizter Alizter force-pushed the cram-terminate-children branch from 5889ad5 to 883b219 Compare May 21, 2025 14:30
subprocess PIDs as not to orphan them. The first "trap" will kill all
processes in the process group and the second "trap" will make sure the main
shell process exits gracefully. *)
fprln oc {|trap "trap 'exit 0' TERM && kill -- -$$" EXIT|};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like a little more detail in the comment here for future readers, specifically to understand the reason for registering a trap handler inside another trap handler. Here's my guess at why it's written this way:

The kill -- -$$ sends TERM to all processes in the process group. Is the reason for the "inner" trap so that when said TERM is received by the main shell process that it exits with 0? I guess there's no easy way to send TERM to all processes in the current process's group except for the current process? And if you were to move the inner trap outside of the EXIT trap handler (and just do kill -- -$$ on EXIT) then the process could exit 0 in response to an unexpected TERM which we don't want.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, this was my solution to avoiding killing the main process. If we end up killing the main process, then Dune will complain that the process is was running received a kill signal, what we want instead is to preserve the old trap behaviour which was to exit 0.

By switching the trapping behaviour just before we kill the process group, we can signal to all the processes in the group that they should exit when encountering TERM, which they all promptly do.

Copy link
Collaborator

@gridbugs gridbugs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very useful! Minor comment to clarify the nested trap but otherwise happy with this.

Copy link
Member

@rgrinberg rgrinberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the right fix. We can use the same mechanism for cleanup that we do for regular actions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Background tasks in Cram tests are not terminated
3 participants