Backoff retry delay in status check is increasing to hours rendering services offline #2897

eugene-sadovsky · 2023-11-14T11:12:09Z

Spring Boot Admin Server information

Version:
3.1.4
Spring Boot version:
3.1.0

Client information

Used discovery mechanism:
Consul

Description

Exponential back-off delay in de.codecentric.boot.admin.server.services.IntervalCheck is increasing to hours. I noticed that after I run SBA for 2+ weeks, previously registered services go offline for hours and then they become available again. Restarting SBA helps right away. This is always accompanied by the error message: Unexpected error in status-check: reactor.core.Exceptions$OverflowException: Could not emit tick NN due to lack of requests (interval doesn't support small downstream requests that replenish slower than the ticks)
After some investigation it looks like this happens when checkAllInstances method times-out (takes longer to complete than the interval check) and it triggers a retry. The back-off interval keeps increasing with each failure during the life-time of the SBA and eventually grows to hours. I actually takes about 12+ retries, The situation improved by lowering spring.boot.admin.timeout.health to 3 seconds. By default health endpoint timeout is equal to spring.boot.admin.status-interval (10s).

Here's the code snippet that reproduces this behavior. It will slow-down with each retry

The text was updated successfully, but these errors were encountered:

erikpetzold · 2023-11-17T10:24:20Z

Hi @eugene-sadovsky ,

that the retry time increases is intended behaviour. But you are right that the waiting time might get too high.
We introduced a new property for maxBackoff, so you can configure this on your own. The default maxBackoff for status check is now 60 seconds.

eugene-sadovsky · 2023-11-17T10:40:13Z

thank you for the quick response 🙇🏼

eugene-sadovsky · 2023-11-17T11:07:53Z

I think the main issue is that back-off time is never reset back to zero after successful retry. It will just saturate to maxBackoff and stay like this for the lifetime of the process. This still solves my issue, thank you 👍🏼

* #2897: WIP Fix exponential backoff * reduce number of places where defaults can be defined * use configured backoff in retry * #2897: javaformat * add Test * add docs * reduce number of places with defaults --------- Co-authored-by: ulrichschulte <[email protected]>

erikpetzold · 2023-11-17T11:19:39Z

if this is really true that would be a bug in project reactor I think

eugene-sadovsky · 2023-11-17T11:33:51Z

yeah, this is the behavior I observed. You can reproduce it by running my gist, it closely resembles the code in IntervalCheck. It randomly simulates a timeout, then there may be few successful checks, then timeout again. With each retry delay becomes longer and never goes back to zero

eugene-sadovsky added bug waiting-for-triage labels Nov 14, 2023

ulischulte added a commit that referenced this issue Nov 17, 2023

#2897: WIP Fix exponential backoff

9cc7702

ulischulte added a commit that referenced this issue Nov 17, 2023

#2897: javaformat

75a2365

erikpetzold mentioned this issue Nov 17, 2023

Bugfix/2897 fix exponential backoff #2903

Merged

erikpetzold added enhancement and removed bug waiting-for-triage labels Nov 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backoff retry delay in status check is increasing to hours rendering services offline #2897

Backoff retry delay in status check is increasing to hours rendering services offline #2897

eugene-sadovsky commented Nov 14, 2023

erikpetzold commented Nov 17, 2023 •

edited

Loading

eugene-sadovsky commented Nov 17, 2023

eugene-sadovsky commented Nov 17, 2023

erikpetzold commented Nov 17, 2023

eugene-sadovsky commented Nov 17, 2023

Backoff retry delay in status check is increasing to hours rendering services offline #2897

Backoff retry delay in status check is increasing to hours rendering services offline #2897

Comments

eugene-sadovsky commented Nov 14, 2023

Spring Boot Admin Server information

Client information

Description

erikpetzold commented Nov 17, 2023 • edited Loading

eugene-sadovsky commented Nov 17, 2023

eugene-sadovsky commented Nov 17, 2023

erikpetzold commented Nov 17, 2023

eugene-sadovsky commented Nov 17, 2023

erikpetzold commented Nov 17, 2023 •

edited

Loading