-
Notifications
You must be signed in to change notification settings - Fork 437
feat(cram): cleanup subprocesses on exit #11827
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
4cabe9e
to
9573983
Compare
9573983
to
eff4020
Compare
Signed-off-by: Ali Caglayan <[email protected]>
Signed-off-by: Ali Caglayan <[email protected]>
adbbb35
to
49cc1b4
Compare
This should fix a common problem where users run subprocesses, for example using `./script.sh &` inside their cram tests. If these subprocesses do not terminate by themselves, they become orphaned and have to be manually killed. A common use case is testing a web-server. We add a more sophisticated trap for the underlying shell script of a cram test so that it can successfully terminate all of its children. Signed-off-by: Ali Caglayan <[email protected]>
6595f3f
to
5889ad5
Compare
Signed-off-by: Ali Caglayan <[email protected]>
5889ad5
to
883b219
Compare
subprocess PIDs as not to orphan them. The first "trap" will kill all | ||
processes in the process group and the second "trap" will make sure the main | ||
shell process exits gracefully. *) | ||
fprln oc {|trap "trap 'exit 0' TERM && kill -- -$$" EXIT|}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like a little more detail in the comment here for future readers, specifically to understand the reason for registering a trap handler inside another trap handler. Here's my guess at why it's written this way:
The kill -- -$$
sends TERM
to all processes in the process group. Is the reason for the "inner" trap so that when said TERM
is received by the main shell process that it exits with 0? I guess there's no easy way to send TERM
to all processes in the current process's group except for the current process? And if you were to move the inner trap outside of the EXIT
trap handler (and just do kill -- -$$
on EXIT
) then the process could exit 0
in response to an unexpected TERM
which we don't want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly, this was my solution to avoiding killing the main process. If we end up killing the main process, then Dune will complain that the process is was running received a kill signal, what we want instead is to preserve the old trap
behaviour which was to exit 0
.
By switching the trapping behaviour just before we kill the process group, we can signal to all the processes in the group that they should exit when encountering TERM
, which they all promptly do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very useful! Minor comment to clarify the nested trap
but otherwise happy with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is the right fix. We can use the same mechanism for cleanup that we do for regular actions.
This PR adds a reproduction case for #11820 where a subprocess is started in a cram test but becomes orphaned once the cram test terminates.
We then introduce the following fix. We use a more sophisticated
trap
at the start of the underlying shell script in order to terminate all processes in the process group. In order to avoid the main shell being terminated, we use a second trap to exit gracefully.Fix #11820
This also allows us to re-enable some previously flakey tests in the CI.
A future improvement would be to run cram tests entirely as dune actions themselves, that way we can delegate process handling to the engine and have better cross platform support. That will require a lot more work however, so in the meantime we offer this improvement.