"Cannot connect to the Docker daemon" errors appear more frequently as more runners we deploy #3828
Open
4 tasks done
Labels
bug
Something isn't working
gha-runner-scale-set
Related to the gha-runner-scale-set mode
needs triage
Requires review from the maintainers
Checks
Controller Version
0.9.3
Deployment Method
ArgoCD
Checks
To Reproduce
3.1. If less than around 200 runners, only 1 or 2 runners fail through the week because of docker.
3.2. If more than around 200 runners, around a third or the runners fail and jobs have to be re-runned.
Describe the bug
Pipeline step complains that docker is not running:
![timestamped_error](https://private-user-images.githubusercontent.com/86961807/391079562-baebf8f1-5fcb-4620-93fb-8e31791ed127.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkzNjM5MDEsIm5iZiI6MTczOTM2MzYwMSwicGF0aCI6Ii84Njk2MTgwNy8zOTEwNzk1NjItYmFlYmY4ZjEtNWZjYi00NjIwLTkzZmItOGUzMTc5MWVkMTI3LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjEyVDEyMzMyMVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWIzZjJjYzZiM2I0ZGM2NGMyYTQ0MDQyOWZmODNmYTc3NDVhYjJiNWFlN2ZiNDVjMTBjYzllNWM1YTBlZmM3ODcmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.rDfmr40rmoQCx3uw6sKxfBfypEsw9vXkS3CiZIXPQG8)
Describe the expected behavior
It should do any of one of these two options:
Additional Context
Controller Logs
Runner name to copy&search on logs: build-s-99pst-runner-bnhsl https://gist.github.com/snavarro-factorial/ee965f37114d0ac4589169012cc098a6
Runner Pod Logs
Export format in csv: https://gist.github.com/snavarro-factorial/796fba24ba5c7f854d3b95f04b636021
The text was updated successfully, but these errors were encountered: