Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cert-checker: only log database errors #7077

Merged
merged 2 commits into from
Sep 18, 2023
Merged

cert-checker: only log database errors #7077

merged 2 commits into from
Sep 18, 2023

Conversation

aarongable
Copy link
Contributor

Fixes #7040

@aarongable aarongable requested a review from a team as a code owner September 12, 2023 20:17
@aarongable aarongable requested a review from jsha September 12, 2023 20:17
Copy link
Contributor

@jsha jsha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good. The original issue said the expected result was "Don't increment the bad-certs, instead put the errors into their own counters." @jcjones how do you feel about only logging the database errors, and not maintaining a separate counter of database errors?

@jcjones
Copy link
Contributor

jcjones commented Sep 14, 2023

Code looks good. The original issue said the expected result was "Don't increment the bad-certs, instead put the errors into their own counters." @jcjones how do you feel about only logging the database errors, and not maintaining a separate counter of database errors?

It looks like when there's a database error, the checker moves forward and ultimately returns a smaller count of good-certs, which I'm realizing this morning I need to have monitoring for (IN-9580). When that fires, we'd notice the database error logs.

Or we could write an mtail alert for the database error logs, but that's not my favorite choice; mtail isn't currently running against these logs.

Anyway, I'd still prefer that the database error count be a metric that we send to the prometheus pushgateway API, but I'm fine with that being a backlogged issue.

@aarongable
Copy link
Contributor Author

Unfortunately cert-checker is not a long-lived process, so I don't think that exporting metrics from it directly works very well, since it can't be reliably scraped. I now have it output the number of database errors as a separate field in the "report" it logs at the end of its run, but I think that's the best we can do at the moment.

@aarongable aarongable requested review from jsha and a team September 16, 2023 00:13
@aarongable aarongable merged commit cad7266 into main Sep 18, 2023
12 checks passed
@aarongable aarongable deleted the crt-checker-dberr branch September 18, 2023 22:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cert-checker logs database errors as bad certs
4 participants