Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(sequencer): implement retry logic for connecting to the oracle sidecar #1788

Merged
merged 117 commits into from
Nov 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
117 commits
Select commit Hold shift + click to select a range
7c39641
add slinky proto files and generated rust
noot Jul 2, 2024
a18ad26
add slinky grpc service and genesis state
noot Jul 2, 2024
b2cc653
impl most of slinky grpc methods
noot Jul 2, 2024
ff618db
add oracle option to config and connect on startup
noot Jul 2, 2024
280111d
begin vote extension handler logic
noot Jul 2, 2024
565ef67
remove unused protos
noot Jul 2, 2024
4635f40
begin oracle component
noot Jul 3, 2024
70df249
finish extend vote logic
noot Jul 3, 2024
f3045b3
finish vote extension validation in verify_vote_extension
noot Jul 3, 2024
8276363
add native types for slinky proto types
noot Jul 3, 2024
bdcd41a
merge with main, genesis updates
noot Jul 3, 2024
27267e2
update genesis example to incude market map
noot Jul 3, 2024
536fd2c
implement prepare/process proposal vote extension logic
noot Jul 3, 2024
d3ba1fa
fix sequencer block construction
noot Jul 3, 2024
8d031ca
no more unstable is_sorted
noot Jul 3, 2024
b41a719
implement finalize_block price aggregation and storing logic
noot Jul 3, 2024
f197a17
implement oracle module query service
noot Jul 4, 2024
6b84a8a
fixes based on running w sidecar, update genesis example to have oracle
noot Jul 4, 2024
25e024b
fix sequencer unit tests
noot Jul 5, 2024
bb1a975
clippy
noot Jul 5, 2024
f8dd3d6
fmt
noot Jul 5, 2024
b96ce64
remove unused protos
noot Jul 5, 2024
5f9ea4f
attempt to fix sequencer genesis in chart
noot Jul 5, 2024
078b116
lint chart
noot Jul 5, 2024
0ea34d9
maybe fix genesis in chart
noot Jul 5, 2024
63b6b2c
lint chart
noot Jul 5, 2024
72b9fc4
cleanup
noot Jul 5, 2024
5b6a99c
charts
noot Jul 5, 2024
24e916f
fmt
noot Jul 5, 2024
074e7a3
Merge branch 'main' of github.com:astriaorg/astria into noot/slinky
noot Jul 5, 2024
0b6fd9b
add oracle config values to chart
noot Jul 6, 2024
a152b6b
Merge branch 'main' of github.com:astriaorg/astria into noot/slinky
noot Jul 6, 2024
4192612
maybe fix chart
noot Jul 6, 2024
8877caa
merge w main
noot Jul 9, 2024
9d4f8dc
Merge branch 'main' of github.com:astriaorg/astria into noot/slinky
noot Jul 9, 2024
936bf00
update protos
noot Jul 9, 2024
9b28345
Merge branch 'main' of github.com:astriaorg/astria into noot/slinky
noot Jul 12, 2024
dd2b936
address comments
noot Jul 12, 2024
1fdda18
merge w main
noot Aug 19, 2024
3092971
bump chart version
noot Aug 19, 2024
573993c
merge w main
noot Aug 20, 2024
fc6d945
Merge branch 'main' of github.com:astriaorg/astria into noot/slinky
noot Aug 20, 2024
eafff75
address comments
noot Aug 21, 2024
18af189
address comments
noot Aug 22, 2024
c87d28d
address chart comments
noot Aug 22, 2024
cd87fd4
remove clippy too many lines
noot Aug 22, 2024
08c51e4
merge w main, update genesis proto for slinky
noot Aug 22, 2024
b64fc41
update genesis to have SlinkyGenesis field
noot Aug 23, 2024
64ad809
address comments, var name cleanup
noot Aug 23, 2024
c8437f1
fix(core): restructure modules to match proto (#1405)
SuperFluffy Aug 23, 2024
71e3b78
merge w main
noot Aug 26, 2024
65fdc28
fmt protos
noot Aug 26, 2024
8c8d0eb
update buf cfg to ignore slinky lint warnings
noot Aug 26, 2024
69718e9
Merge branch 'main' of github.com:astriaorg/astria into noot/slinky
noot Aug 29, 2024
fb1dbf7
address comments
noot Aug 29, 2024
7fd61bf
merge w main
noot Sep 3, 2024
9d71977
merge w main
noot Sep 3, 2024
ee820d9
recompile protos
noot Sep 3, 2024
c18201c
fix compiled protos
noot Sep 3, 2024
7a1838e
address comments
noot Sep 3, 2024
faa117c
use indexmap everywhere
noot Sep 3, 2024
2928513
fix chart genesis
noot Sep 4, 2024
6c8f81d
Merge branch 'main' of github.com:astriaorg/astria into noot/slinky
noot Sep 4, 2024
6170dbb
fix merge issue
noot Sep 4, 2024
c2c7912
maybe fix chart genesis
noot Sep 4, 2024
3faeb74
maybe fix chart genesis
noot Sep 4, 2024
17e91aa
fix genesis log
noot Sep 4, 2024
3207ff9
maybe fix chart genesis
noot Sep 4, 2024
6b1f665
maybe fix charts
noot Sep 4, 2024
0db673a
maybe fix chart genesis
noot Sep 4, 2024
938df30
fix merge
noot Sep 5, 2024
f8db715
add unit test
noot Sep 5, 2024
dd817a2
merge w main
noot Sep 5, 2024
f6d43dc
cleanup
noot Sep 5, 2024
1ecfc14
cleanup
noot Sep 5, 2024
0506bba
clippy
noot Sep 5, 2024
1b8df49
refactor(sequencer): use streams for fetching slinky data from state …
SuperFluffy Sep 9, 2024
e6f34c3
Merge branch 'main' of github.com:astriaorg/astria into noot/slinky
noot Sep 9, 2024
dca2e52
rework oracle types (#1484)
SuperFluffy Sep 19, 2024
7347bf3
merge wip
noot Sep 19, 2024
7d58ebb
fix merge
noot Sep 19, 2024
21e775a
fix tests
noot Sep 19, 2024
d0bfdf9
Merge branch 'main' of github.com:astriaorg/astria into noot/slinky
noot Sep 24, 2024
a2c7677
bump chart
noot Sep 24, 2024
68bd014
merge w main
noot Sep 27, 2024
72f0e33
Merge branch 'main' of github.com:astriaorg/astria into noot/slinky
noot Sep 30, 2024
6306abe
address some comments
noot Sep 30, 2024
0f1d47f
address remaining comments
noot Oct 1, 2024
42a653b
merge w main
noot Oct 1, 2024
884bfe4
merge w main
noot Oct 3, 2024
168a05a
merge w main
noot Oct 28, 2024
c877303
fix and cleanup unit tests
noot Oct 28, 2024
3edcdf0
chart and clippy
noot Oct 28, 2024
2f76b78
fix smoke tests
noot Oct 28, 2024
2bce117
fmt
noot Oct 28, 2024
bf67de6
update dev values
noot Oct 29, 2024
d2f8718
protos
noot Oct 29, 2024
477d0ec
verify vote extensions in validate_proposal
noot Oct 29, 2024
43be7e8
update verify_vote_extension to not read from state
noot Oct 29, 2024
31bf59a
merge with main
noot Oct 29, 2024
3510b0b
modify consensus_params in justfile, not genesis parser
noot Oct 29, 2024
5d99f4d
bump slinky protos to v2 and rename to connect
noot Oct 29, 2024
17cbd08
fmt proto
noot Oct 29, 2024
e2238b9
move connect protos to vendored, not astria_vendored
noot Oct 30, 2024
2e5677c
Merge branch 'main' of github.com:astriaorg/astria into noot/slinky
noot Oct 30, 2024
33d8dd9
revert buf.lock
noot Oct 30, 2024
e5bc4ce
buf.yaml cleanup
noot Oct 30, 2024
fd69353
make connect optional in genesis
noot Oct 30, 2024
fcf8f49
handle vote_extensions_enable_height and updates to it to handle numb…
noot Nov 1, 2024
8baa0c3
clippy
noot Nov 1, 2024
fbdb015
rename ASTRIA_SEQUENCER_CONNECT_GRPC_ADDR to ASTRIA_SEQUENCER_ORACLE_…
noot Nov 4, 2024
d291a0b
only init MarketMap and Oracle components if vote extensions are enabled
noot Nov 4, 2024
9e07cb0
Merge branch 'main' of github.com:astriaorg/astria into noot/slinky
noot Nov 4, 2024
19c81f1
update CurrencyPairId::increment
noot Nov 4, 2024
8b86f9d
implement retry logic for connecting to the oracle sidecar
Fraser999 Nov 5, 2024
704eec2
merge feat/oracle
noot Nov 6, 2024
925c296
Merge branch 'feat/oracle' of github.com:astriaorg/astria into fraser…
noot Nov 6, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions crates/astria-sequencer/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ bytes = { workspace = true }
divan = { workspace = true, optional = true }
futures = { workspace = true }
hex = { workspace = true, features = ["serde"] }
humantime = { workspace = true }
ibc-types = { workspace = true, features = ["with_serde"] }
indexmap = { workspace = true }
itertools = { workspace = true }
Expand All @@ -71,6 +72,7 @@ thiserror = { workspace = true }
tokio = { workspace = true, features = ["rt", "tracing"] }
tonic = { workspace = true }
tracing = { workspace = true }
tryhard = { workspace = true }

[dev-dependencies]
astria-core = { path = "../astria-core", features = [
Expand Down
136 changes: 110 additions & 26 deletions crates/astria-sequencer/src/sequencer.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
use std::time::Duration;

use astria_core::generated::{
connect::{
marketmap::v2::query_server::QueryServer as MarketMapQueryServer,
Expand All @@ -12,6 +14,7 @@ use astria_core::generated::{
use astria_eyre::{
anyhow_to_eyre,
eyre::{
self,
eyre,
OptionExt as _,
Result,
Expand All @@ -37,6 +40,7 @@ use tokio::{
task::JoinHandle,
};
use tonic::transport::{
Channel,
Endpoint,
Uri,
};
Expand All @@ -61,6 +65,8 @@ use crate::{
service,
};

const MAX_RETRIES_TO_CONNECT_TO_ORACLE_SIDECAR: u32 = 36;

pub struct Sequencer;

impl Sequencer {
Expand Down Expand Up @@ -117,32 +123,9 @@ impl Sequencer {
.context("failed to query state for base prefix")?;
}

let oracle_client = if config.no_oracle {
None
} else {
let uri: Uri = config
.oracle_grpc_addr
.parse()
.context("failed parsing oracle grpc address as Uri")?;
let endpoint = Endpoint::from(uri.clone()).timeout(std::time::Duration::from_millis(
config.oracle_client_timeout_milliseconds,
));
let mut oracle_client = OracleClient::new(
endpoint
.connect()
.await
.wrap_err("failed to connect to oracle sidecar")?,
);

// ensure the oracle sidecar is reachable
// TODO: allow this to retry in case the oracle sidecar is not ready yet
if let Err(e) = oracle_client.prices(QueryPricesRequest::default()).await {
warn!(uri = %uri, error = %e, "oracle sidecar is unreachable");
} else {
debug!(uri = %uri, "oracle sidecar is reachable");
};
Some(oracle_client)
};
let oracle_client = new_oracle_client(&config)
.await
.wrap_err("failed to create connected oracle client")?;

let mempool = Mempool::new(metrics, config.mempool_parked_max_tx_count);
let app = App::new(
Expand Down Expand Up @@ -303,3 +286,104 @@ fn spawn_signal_handler() -> SignalReceiver {
stop_rx,
}
}

/// Returns a new Connect oracle client or `Ok(None)` if `config.no_oracle` is true.
///
/// If `config.no_oracle` is false, returns `Ok(Some(...))` as soon as a successful response is
/// received from the oracle sidecar, or returns `Err` after a fixed number of failed re-attempts
/// (roughly equivalent to 5 minutes total).
#[instrument(skip_all, err)]
async fn new_oracle_client(config: &Config) -> Result<Option<OracleClient<Channel>>> {
if config.no_oracle {
return Ok(None);
}
let uri: Uri = config
.oracle_grpc_addr
.parse()
.context("failed parsing oracle grpc address as Uri")?;
let endpoint = Endpoint::from(uri.clone()).timeout(Duration::from_millis(
config.oracle_client_timeout_milliseconds,
));

let retry_config = tryhard::RetryFutureConfig::new(MAX_RETRIES_TO_CONNECT_TO_ORACLE_SIDECAR)
.exponential_backoff(Duration::from_millis(100))
.max_delay(Duration::from_secs(10))
.on_retry(
|attempt, next_delay: Option<Duration>, error: &eyre::Report| {
let wait_duration = next_delay
.map(humantime::format_duration)
.map(tracing::field::display);
warn!(
error = error.as_ref() as &dyn std::error::Error,
attempt,
wait_duration,
"failed to query oracle sidecar; retrying after backoff",
);
async {}
},
);

let client = tryhard::retry_fn(|| connect_to_oracle(&endpoint, &uri))
.with_config(retry_config)
.await
.wrap_err_with(|| {
format!(
"failed to query oracle sidecar after {MAX_RETRIES_TO_CONNECT_TO_ORACLE_SIDECAR} \
retries; giving up"
)
})?;
Ok(Some(client))
}

#[instrument(skip_all, err(level = tracing::Level::WARN))]
async fn connect_to_oracle(endpoint: &Endpoint, uri: &Uri) -> Result<OracleClient<Channel>> {
let mut oracle_client = OracleClient::new(
endpoint
.connect()
.await
.wrap_err("failed to connect to oracle sidecar")?,
);
let _ = oracle_client
.prices(QueryPricesRequest::default())
.await
.wrap_err("oracle sidecar responded with error to query for prices");
Comment on lines +346 to +349
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we should move this out of the retried function and call it after we successfully connect to the client, as if the grpc call for prices fails, it indicates a more serious error that probably isn't fixable upon retry. thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is maybe a general problem with most or even all of our retry loops like this. We probably have some error cases which are permanent/unrecoverable, and others which are transient. However we always treat them as transient and keep retrying.

In this specific case, I guess we could also get a transient failure of the gRPC (e.g. timeout or dropped connection)? We should probably look to abandon the retry loop if the error indicates a permanent failure, but keep trying if not. However, that's another can of worms, since matching on error variants isn't a lot of fun when using eyre/anyhow.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, maybe we can do a larger revisit of the retry functions. this is fine for now, but something to keep in mind

debug!(uri = %uri, "oracle sidecar is reachable");
Ok(oracle_client)
}

#[cfg(test)]
mod tests {
use super::*;

#[tokio::test(start_paused = true)]
async fn should_wait_while_unable_to_connect() {
// We only care about `no_oracle` and `oracle_grpc_addr` values - the others can be
// meaningless.
let config = Config {
listen_addr: String::new(),
db_filepath: "".into(),
log: String::new(),
grpc_addr: String::new(),
force_stdout: true,
no_otel: true,
no_metrics: true,
metrics_http_listener_addr: String::new(),
pretty_print: true,
no_oracle: false,
oracle_grpc_addr: "http://127.0.0.1:8081".to_string(),
oracle_client_timeout_milliseconds: 1,
mempool_parked_max_tx_count: 1,
};

let start = tokio::time::Instant::now();
let error = new_oracle_client(&config).await.unwrap_err();
assert!(start.elapsed() > Duration::from_secs(300));
assert_eq!(
error.to_string(),
format!(
"failed to query oracle sidecar after {MAX_RETRIES_TO_CONNECT_TO_ORACLE_SIDECAR} \
retries; giving up"
)
);
}
}
Loading