-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add health check API endpoints #595
base: main
Are you sure you want to change the base?
Conversation
ede6246
to
819c956
Compare
Might need to bind |
Have not really looked at the PR but I think we should agree on same names for those probes in Trino Gateway and Trino . What do you think are suitable URLs .. I think it could literally just be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK Trino doesn't yet have endpoints dedicated for probes. The z
suffix is a convention started at Google, widely adopted in Kubernetes apps. I'm ok with that, OTOH there's little chance we'll have conflicts with other endpoints.
gateway-ha/src/main/java/io/trino/gateway/ha/resource/GatewayHealthCheckResource.java
Show resolved
Hide resolved
gateway-ha/src/main/java/io/trino/gateway/ha/resource/GatewayHealthCheckResource.java
Outdated
Show resolved
Hide resolved
gateway-ha/src/test/java/io/trino/gateway/ha/TestGatewayHaMultipleBackend.java
Outdated
Show resolved
Hide resolved
Fair enough .. i saw the healthz in others but the two you created @oneonestar are better .. so lets run with that then. |
819c956
to
d20f52e
Compare
72becaa
to
034b5f6
Compare
gateway-ha/src/test/java/io/trino/gateway/ha/TestGatewayHaMultipleBackend.java
Show resolved
Hide resolved
gateway-ha/src/main/java/io/trino/gateway/ha/resource/GatewayHealthCheckResource.java
Show resolved
Hide resolved
docs/operation.md
Outdated
## API health endpoints | ||
|
||
Trino Gateway provides two API endpoints to indicate the current status of the server. | ||
`/trino-gateway/livez` returns status code 200, indicating the server is alive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`/trino-gateway/livez` returns status code 200, indicating the server is alive. | |
* `/trino-gateway/livez` returns status code 200, indicating the server is alive. |
we should detail what "alive" means specifically for Trino Gateway ..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also .. can it return something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This endpoint always sends back a 200 status code. However, it might not respond if the gateway is too busy, stuck, or taking a long time for garbage collection. This is what "alive" means.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is k8s basics but I think we should let our users know anyway .. not all of them will have k8s background or even run it on k8s
docs/operation.md
Outdated
|
||
Trino Gateway provides two API endpoints to indicate the current status of the server. | ||
`/trino-gateway/livez` returns status code 200, indicating the server is alive. | ||
`/trino-gateway/readyz` returns status code 200, indicating the server has |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`/trino-gateway/readyz` returns status code 200, indicating the server has | |
* `/trino-gateway/readyz` returns status code 200, indicating the server has |
also need to detail what that means .. such as .. the backend database connection is working, the connection to the external routing server (if any) is alive, and so on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In current implementation readyz
means the initial connection to backend database and the first round of health check on Trino clusters were completed. I think this might expose too much detail to the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No .. I think our users are very technical and we should not hide that information from in the docs .. its not in the UI or so .. but it needs to be in the docs
Description
Add two new health check API endpoints:
/trino-gateway/livez
and/trino-gateway/readyz
.Additional context and related issues
/trino-gateway/livez
always returns 200./trino-gateway/readyz
returns 200 after the initial fetch from database and the first round of Trino cluster health check is completed. Otherwise, 503 will be returned.Release notes
(x) Release notes are required, with the following suggested text: