Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operator doesn't respond to Evicted runner #232

Closed
FrederikNJS opened this issue Apr 20, 2021 · 8 comments · Fixed by #262
Closed

Operator doesn't respond to Evicted runner #232

FrederikNJS opened this issue Apr 20, 2021 · 8 comments · Fixed by #262
Labels
bug Something isn't working enhancement New feature or request

Comments

@FrederikNJS
Copy link

FrederikNJS commented Apr 20, 2021

We frequently experience runners getting evicted, due to using too much ephemeral storage. It's probably due to the builds themselves not cleaning up after themselves.

However when a runner is evicted due to using too much memory or ephemeral storage, nothing happens. I can see a use for keeping the Evicted pod around for debugging purposes, but the operator should notice that a runner was evicted, and spin up a new one, to ensure that the minimum number of healthy runners is kept around. If the Evicted pod is deleted, the operator responds immediately by spinning up a new runner.

Would it be possible for the operator to regard an Evicted runner just like if it doesn't exist?

@davidkarlsen davidkarlsen added the enhancement New feature or request label Apr 20, 2021
@yfried
Copy link

yfried commented Apr 25, 2021

We see the same problem

@davidkarlsen
Copy link
Collaborator

thanks for reporting, can you check if the runners are still registered at github when the pod has been evicted (needs to be checked quite soon after it becoming evicted, as gh will eventually remove dead runners). Some logic may be needed to get them unregistered, as well as catering for the pod-status.

@yfried
Copy link

yfried commented Apr 25, 2021

thanks for reporting, can you check if the runners are still registered at github when the pod has been evicted (needs to be checked quite soon after it becoming evicted, as gh will eventually remove dead runners). Some logic may be needed to get them unregistered, as well as catering for the pod-status.

@davidkarlsen

  1. Yes - Evicted pods are still registered at github (Org Settigns -> Actions -> Self-hosted runners -> Runner groups)
  2. They are left there for a while. We had one stuck in Offline for the whole weekend

@duyhenryer
Copy link

I the same problem.

@davidkarlsen
Copy link
Collaborator

  1. what version are you running?
  2. what does the operator logs say? (It should attempt to deregister them with gh and then remove the finalizer, allowing them to be deleted.

@duyhenryer
Copy link

@davidkarlsen OK Bro,
I had stuck same issues 212 here, I start the latest version and another namespace. I check Evicted pods is deleted.
But namespace old it still status Terminating.
Thank bro

@Yagyansh
Copy link
Contributor

Yagyansh commented Jul 28, 2022

@davidkarlsen
I am still facing this issue in v0.10.0. New runner pods don't come up when an older pod gets evicted and then that runner pool goes out of sync and all the jobs get queued.

I've to manually delete the evicted pod for the new pods to come up and make everything operational.

FYI, I've installed the operator via the helm chart mentioned in the Readme and I'm running with GHE.

@Yagyansh
Copy link
Contributor

Yagyansh commented Aug 1, 2022

@davidkarlsen I see that the runner is getting unregistered when a pod is evicted but still isn't deleted and the new runner isn't spun by the operator.

Also, I don't see finalizers attached to the evicted pods. Does it mean the finalizer was also removed successfully?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants