-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Operator doesn't respond to Evicted
runner
#232
Comments
We see the same problem |
thanks for reporting, can you check if the runners are still registered at github when the pod has been evicted (needs to be checked quite soon after it becoming evicted, as gh will eventually remove dead runners). Some logic may be needed to get them unregistered, as well as catering for the pod-status. |
|
I the same problem. |
|
@davidkarlsen OK Bro, |
@davidkarlsen I've to manually delete the evicted pod for the new pods to come up and make everything operational. FYI, I've installed the operator via the helm chart mentioned in the Readme and I'm running with GHE. |
@davidkarlsen I see that the runner is getting unregistered when a pod is evicted but still isn't deleted and the new runner isn't spun by the operator. Also, I don't see finalizers attached to the evicted pods. Does it mean the finalizer was also removed successfully? |
We frequently experience runners getting evicted, due to using too much ephemeral storage. It's probably due to the builds themselves not cleaning up after themselves.
However when a runner is evicted due to using too much memory or ephemeral storage, nothing happens. I can see a use for keeping the Evicted pod around for debugging purposes, but the operator should notice that a runner was evicted, and spin up a new one, to ensure that the minimum number of healthy runners is kept around. If the Evicted pod is deleted, the operator responds immediately by spinning up a new runner.
Would it be possible for the operator to regard an Evicted runner just like if it doesn't exist?
The text was updated successfully, but these errors were encountered: