Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not able to determine if/when a graceful shutdown_timeout has elapsed #606

Closed
EddieWhi opened this issue Oct 2, 2024 · 1 comment
Closed
Labels
wontfix This will not be worked on

Comments

@EddieWhi
Copy link

EddieWhi commented Oct 2, 2024

Recently been investigating some possible shutdown issues with one of our service. Started by suspecting that k8s was killing the service before it had time to shutdown cleanly but after setting up a simple test application can see that it could be the shutdown_timeout expiring.

Example application here: https://gist.github.com/EddieWhi/9de3486bced26c818f2be97c20a8fa9f

After running the application, curling / then curling /shutdown after a few seconds I get the following logs:

12:38:01 [INFO] Starting server...
12:38:01 [INFO] starting 10 workers
12:38:01 [INFO] Actix runtime found; starting in Actix runtime
12:38:01 [INFO] starting service: "actix-web-service-127.0.0.1:8080", workers: 10, listening on: 127.0.0.1:8080
12:38:05 [INFO] Sleeping 0
12:38:06 [INFO] Sleeping 1
12:38:07 [INFO] Sleeping 2
12:38:08 [INFO] accept thread stopped
12:38:08 [INFO] graceful worker shutdown; finishing 1 connections
12:38:08 [INFO] graceful worker shutdown; finishing 1 connections
12:38:08 [INFO] shutting down idle worker
12:38:08 [INFO] shutting down idle worker
12:38:08 [INFO] shutting down idle worker
12:38:08 [INFO] shutting down idle worker
12:38:08 [INFO] shutting down idle worker
12:38:08 [INFO] shutting down idle worker
12:38:08 [INFO] shutting down idle worker
12:38:08 [INFO] shutting down idle worker
12:38:08 [INFO] Sleeping 3
12:38:09 [INFO] Sleeping 4
12:38:10 [INFO] Sleeping 5

Which is similar to what we're seeing in production... the logs just stop, nothing to indicate that the service was forcefully killed.

It would be very useful to know that a service was being forcefully terminated. Certainly would help me right now.

Naively, I was thinking of making a PR along the lines of this: master...EddieWhi:actix-net:add-log-on-graceful-shutdown-timeout but wonder if I'm missing something obvious first?

Any help much appreciated.

@robjtede
Copy link
Member

robjtede commented Dec 29, 2024

It's likely that this is caused by keep-alive connections which don't have great handling at the moment. It's high on my priority list to fix. If your terminationGracePeriodSeconds (default 30 seconds) is equal or less than your HTTP server shutdown timeout (also default 30 seconds) then there's probably a race from the time SIGQUIT is sent and k8s probably wins it, so the logs just stop due to SIGKILL.

If you increase terminationGracePeriodSeconds a bit then it would allow the app to terminate in a slightly more graceful way, though connections will still be killed until the afformentioned bug is fixed for keepalive conns.

Given this, I'm going to close the issue and point you to actix/actix-web#3169 to track the progress of this fix.

@robjtede robjtede closed this as not planned Won't fix, can't repro, duplicate, stale Dec 29, 2024
@robjtede robjtede added the wontfix This will not be worked on label Dec 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

2 participants