-
-
Notifications
You must be signed in to change notification settings - Fork 216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry connections to PostgreSQL before starting martin #1545
base: main
Are you sure you want to change the base?
Conversation
Thanks, looks good! We should discuss the configuration for this too as part of this PR. Jitter is a good idea, thx. At some point we should also test/evaluate what happens when PG restarts (drops connections, switches to another primary, etc) while Martin is running - in theory PG pool should simply reconnect, but I have never tested that. Probably good for another PR. |
let pg = PgBuilder::new(self, id_resolver).await?; | ||
// Retry strategy: Fixed 5 seconds interval backoff with jitter (random variation) | ||
let retry_strategy = FixedInterval::from_millis(5000) | ||
.map(jitter) // Add random jitter to avoid "thundering herd" problem |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might be a bit unexpected for users:
This strategy might not wait at all.
=> all 3 request may be either a total of 15s or 0s.
https://docs.rs/tokio-retry/latest/src/tokio_retry/strategy/jitter.rs.html#3-5
The amount to which jitter affects the FixedInterval
should likely not be 0%-100%
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, thx, I agree - we should probably do something like +/- 20% - does jitter support that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can just have our own function which does what we need..
The one-liner is not that complicated 😁
Also we might want to lower the retry time and increase the retry amount. As far as I remember it (might be misremembering) Postgres takes a hot few seconds (somewhere between 10 and 20) to start up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late reply, suddenly got a lot on my plate. I'll continue working on this around Tuesday or Wednesday.
There are multiple parameters that we can configure:
- Retry strategy. Tokio_retry has 3 built in strategy, do we want it to be configurable or do we just choose one? If we do choose one, exponential or fibonacci back off might be a better choice than fixed interval.
- the interval (and factor if we use exponential/fibonacci). Choosing the default value for this one is a bit tricky. In my limited experience, CMIIW, cold starting a postgresql server does not take very long, <1s. It's another story if you also start a server or a container. I think a good option right now is to have retry amount of 5 and the sum of wait duration across these 5 retry is 60 seconds.
- Jitter. Yeah I didn't realize jitter just multiply the duration by 0-100%, my bad (first time working with rust, im guilty of skipping some part that I don't understand lol). We can do jitter by 80%-120%. Do we want jitter to be configurable (enable/disable and jitter range)?
Trying to solve (#1539)
There are still a lot of improvements that can be done, such as making the retry params configurable and adding test for unreachable db instance. But would love to get feedbacks first before I proceed on those.