-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Health probe/bad state detection #57
Comments
@szalai1 would any of your prometheus work be relevant here? I'm thinking we could reuse the |
the options I see to kill the pod in such a case:
tl;dr I think we need a |
I would agree with the /ready endpoint - sometimes metrics endpoints are "heavy" and not great targets for probes as a result, that's my primary concern with using a metrics endpoint. |
Do we need readiness probes not just for connectivity to kafka.. but also for each individual action? e.g. if connectivity to DataHub Kafka looks good, but Slack action is unhealthy because the credentials don't work, what would the expected behavior be? |
Thats a good question... couple of brief thoughts
|
This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io |
Not stale |
Hey all - Bumping this because we just had an actions container enter an unhealthy state where it appeared to run out of file descriptors ( |
Hey all! Wondering if y'all have thought about adding a health probe endpoint (or status checker of some sort) to this project that does things like asserts on whether or not healthy connections have been opened to the sources that it's supposed to be listening to.
I'm asking because I'm running into a situation where the container is essentially just looping and printing a message like this:
I'm hoping to find a way to have the container exit in situations like this, whether it be using an external health probe to an http endpoint, or some internal status checking that exits/throws an exception in a situation like this. Any thoughts?
The text was updated successfully, but these errors were encountered: