Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: start process subreaper at top level to avoid shutdown hangs #380

Merged
merged 2 commits into from
Mar 12, 2024

Conversation

benhoyt
Copy link
Contributor

@benhoyt benhoyt commented Mar 11, 2024

For example, if an exec command is running when you Ctrl-C the Pebble daemon, it hangs.

It's hanging because the ServiceManager is stopping the reaper early in the process, before TaskRunner has had a chance to abort the exec tasks (aborting an exec task sends SIGKILL to its pid via the tomb and command context).

Then when TaskRunner.Stop is called, it calls tomb.Kill and then tomb.Wait on each task (exec) tomb, and because the reaper's not running, the tomb.Wait hangs.

So the fix is to move the reaper.Start and reaper.Stop to the top level (inside the "run" command, which is also used for "enter"), instead of putting them in servstate.NewManager and ServiceManager.Stop.

After doing this, some of the tests also had to be modified to start and stop the reaper.

Fixes #163 and fixes #284

For example, if an exec command is running when you Ctrl-C the Pebble
daemon, it hangs.

It's hanging because the ServiceManager is stopping the reaper early in
the process, before TaskRunner has had a chance to abort the exec tasks
(aborting an exec task sends SIGKILL to its pid via the tomb and
command context).

Then when TaskRunner.Stop is called, it calls tomb.Kill and then
tomb.Wait on each task (exec) tomb, and because the reaper's not
running, the tomb.Wait hangs.

So the fix is to move the reaper.Start and reaper.Stop to the top
level (inside the "run" command, which is also used for "enter"),
instead of putting them in servstate.NewManager and
ServiceManager.Stop.

After doing this, some of the tests also had to be modified to start
and stop the reaper.

Fixes canonical#163
@benhoyt benhoyt requested a review from flotter March 11, 2024 04:10
Copy link
Contributor

@flotter flotter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benhoyt I've created a issue for us to also start the reaper top level, as we have a different entrypoint, and with reaper init removed at the servstate level, we will have to do this, before we move our Pebble reference forward.

Copy link
Contributor

@flotter flotter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@benhoyt benhoyt merged commit ca137c8 into canonical:master Mar 12, 2024
15 checks passed
@benhoyt benhoyt deleted the better-reaper-start branch March 12, 2024 21:03
benhoyt added a commit that referenced this pull request Mar 21, 2024
PR #380 recently changed the way the reaper is stopped.

If `runDaemon()` returns an error the `defer` method overwrites the
`err` return value in `err = reaper.Stop()`. If `err` was already set to
something else, this will in effect overwrite the non-`nil` `err` with
either `nil` (if `reaper.Stop()` returns `nil`) or a different error (if
`reaper.Stop()` returns a different error).

Because this defer method is the first one in the function, it will get
called last, so it's the one that will have `err` set on entry if the
function returns a non-`nil` error.

---------

Co-authored-by: Ben Hoyt <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Daemon / Reaper challenges Running pebble exec makes both client and server hang on server shutdown
2 participants