Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry connections to PostgreSQL before starting martin #1545

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions martin/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,7 @@ tilejson.workspace = true
tokio = { workspace = true, features = ["io-std"] }
tokio-postgres-rustls = { workspace = true, optional = true }
url.workspace = true
tokio-retry = "0.3.0"

[build-dependencies]
static-files = { workspace = true, optional = true }
Expand Down
17 changes: 15 additions & 2 deletions martin/src/pg/config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,9 @@ use crate::pg::utils::on_slow;
use crate::pg::PgResult;
use crate::source::TileInfoSources;
use crate::utils::{IdResolver, OptBoolObj, OptOneMany};
use crate::MartinResult;
use crate::{MartinError, MartinResult};
use tokio_retry::strategy::{jitter, FixedInterval};
use tokio_retry::Retry;

pub trait PgInfo {
fn format_id(&self) -> String;
Expand Down Expand Up @@ -114,7 +116,18 @@ impl PgConfig {
}

pub async fn resolve(&mut self, id_resolver: IdResolver) -> MartinResult<TileInfoSources> {
let pg = PgBuilder::new(self, id_resolver).await?;
// Retry strategy: Fixed 5 seconds interval backoff with jitter (random variation)
let retry_strategy = FixedInterval::from_millis(5000)
.map(jitter) // Add random jitter to avoid "thundering herd" problem
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might be a bit unexpected for users:

This strategy might not wait at all.
=> all 3 request may be either a total of 15s or 0s.

https://docs.rs/tokio-retry/latest/src/tokio_retry/strategy/jitter.rs.html#3-5

The amount to which jitter affects the FixedInterval should likely not be 0%-100%

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thx, I agree - we should probably do something like +/- 20% - does jitter support that?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can just have our own function which does what we need..
The one-liner is not that complicated 😁

Also we might want to lower the retry time and increase the retry amount. As far as I remember it (might be misremembering) Postgres takes a hot few seconds (somewhere between 10 and 20) to start up

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late reply, suddenly got a lot on my plate. I'll continue working on this around Tuesday or Wednesday.

There are multiple parameters that we can configure:

  1. Retry strategy. Tokio_retry has 3 built in strategy, do we want it to be configurable or do we just choose one? If we do choose one, exponential or fibonacci back off might be a better choice than fixed interval.
  2. the interval (and factor if we use exponential/fibonacci). Choosing the default value for this one is a bit tricky. In my limited experience, CMIIW, cold starting a postgresql server does not take very long, <1s. It's another story if you also start a server or a container. I think a good option right now is to have retry amount of 5 and the sum of wait duration across these 5 retry is 60 seconds.
  3. Jitter. Yeah I didn't realize jitter just multiply the duration by 0-100%, my bad (first time working with rust, im guilty of skipping some part that I don't understand lol). We can do jitter by 80%-120%. Do we want jitter to be configurable (enable/disable and jitter range)?

.take(3); // Retry up to 3 times

// Create PgBuilder using retry_strategy
let pg = Retry::spawn(retry_strategy, || async {
PgBuilder::new(self, id_resolver.clone()).await
})
.await
.map_err(MartinError::PostgresError)?;

let inst_tables = on_slow(
pg.instantiate_tables(),
// warn only if default bounds timeout has already passed
Expand Down
Loading