-
-
Notifications
You must be signed in to change notification settings - Fork 451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Caching of TLS Certificates #856
Comments
This assumes that nothing went wrong, which is exactly what Gatus is supposed to help protect against. What if somebody accidentally updated the certificate with a certificate that's 3 years old? The fact that the certificate is cached would hide this issue |
@TwiN we have hit this issue as well. We have auto-renewing certificates via Smallstep, the certificates were updated and the endpoints are serving the updated certificate, but Gatus continues to show the certificate is expiring from the previous certificate cycle. The only way to fix the alert is to restart the Gatus instance. To be clear, it seems that Gatus is caching certificates, which causes short lived TLS certs to alert and not clear without a restart of the Gatus instance. Digging a bit more on our own side, this specific service with the "stuck" or "cached" certificate from the Gatus side might be related to our long timeouts on the server side for supporting server sent events. It is possible that the http client in Gatus is able to stay connected for long periods of time and not making a new connection with every request. Could potentially be a way to fix by making sure that the http client in Gatus closes connections after making a request, rather than letting the stdlib close out idle connections that might not ever happen on frequent health checks. |
I don't want to impose something on you which you have not planned against. But I wanted to know what do you mean when you say -
what could have gone wrong?, How is Gatus protecting me from something when it is using a cached version of a certificate which no longer is applied on the ingress from which it is doing a healthcheck? |
Ugh, it's been a long few weeks and I didn't read the issue properly. I had no idea certificates were being cached, and I completely agree with you. I think they shouldn't be cached, or at the very least, they shouldn't be cached for more than 24h. I'm sorry about the misunderstanding. |
Its ok, I like your tool, I will see if I can find the code that does this and try to assist you. |
FYI @renevo Hey I wasn't able to pinpoint the code which causes the caching but I was able to test that Gatus What I did was ran an Nginx service with logging enabled in which I could see connection ids. Format I used was as below -
I saw that each time Gatus did a check it created a new connection id. See the logs below -
So problems not there.1 |
btw I was running gatus from main when I did these checks |
Describe the bug
If you have a Certificate expiry check configured and the certificate was renewed a few days ago, gatus will still keep the previous older certificate cache and continue to do checks on the older one even though a newer certificate exists on the endpoint that it is checking.
What do you see?
Gatus continues to use the previously cached certificate to do checks and which results in failed checks and false alerts being raised.
What do you expect to see?
Gatus should check for the certificate with each request made and always compare the checks with them instead of using a cached version of the certs.
List the steps that must be taken to reproduce this issue
For Example we are using Certmanager so we set the ingress labels like this
Even if the certificate renewal was done 5 days ago, gatus still has the previous certificate cached and will continue to do checks with it and not get the latest cert from endpoint.
After restart as the cache is removed gatus will pick latest cert and work as expected.
Version
5.11.0
Additional information
No response
The text was updated successfully, but these errors were encountered: