-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove loopback when monitor fails #25
Comments
do you have an appropriate cleanup timer set for the app ? When a monitor for an app fails, it starts the cleanup timer. only when the cleanup timer expires, the app and corresponding loopback is removed . See Line 343 in 6f26e86
Line 200 in 6f26e86
Are you also seeing any errors related to removing loopback in the gocast logs ? |
I definitely don’t have an appropriate cleanup timer set. I see the default is 15m and was assuming much faster. I’ll adjust that and give it a try! |
To remove the vip addr from the host. This should reduce the time same-host apps appear broken when there is a non-working addr on the localhost ref: mayuresh82/gocast#25 (comment)
This appears to work with the lower timeout! Do you know of any practical minimum? I'm trying with 15s now but need to experiment more. Would 1s or 5s cause any problems / is there a reason why the addr shouldn't be removed with the annoucement instead of waiting for the cleanup timer? I was caught of guard that anything else on the same host would be ~broken until that cleanup timer completed, even though the route was gone. |
the original idea was that transient issues can cause apps to appear/disappear in consul, and so this is just a prevention mechanism against flapping adding removing loopbacks too often. I do agree though that this can be removed such that a default timer of 0 means remove immediately. I can give it a go, or feel free to send a PR for this ! |
When a monitor succeeds the loopback, nat, and announcement are created. When a monitor fails the announcement stops but the loopback addr remains until the monitor is removed. This is causing an unexpected behavior where a replacement Nomad job cannot reach resources on another host because the addr on a host remains.
For my setup I have multiple Traefik instances running with the same VIP. During a deploy the instances will be replaced but need to pull a new Docker image. Traefik is used as the LB to the Docker registry and the not-running Traefik instance cannot respond. The VIP is still assigned (but not announced) on the host and requests to the VIP fail (because Traefik isn't running).
Is this expected behavior? Should the loopback be removed when the monitor/consul-check fails?
The text was updated successfully, but these errors were encountered: