Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(zksync_cli): Health checkpoint improvements #3193

Merged
merged 70 commits into from
Nov 26, 2024
Merged
Show file tree
Hide file tree
Changes from 66 commits
Commits
Show all changes
70 commits
Select commit Hold shift + click to select a range
8dd58dc
feat: add dummy node version heathcheck
manuelmauro Oct 29, 2024
98167a1
feat: add version information to healthcheck
manuelmauro Oct 30, 2024
d08ecae
refactor: simplify static health check
manuelmauro Oct 30, 2024
a058388
feat: add last migration to system_dal
manuelmauro Oct 30, 2024
a5f571b
feat: add database healthcheck
manuelmauro Oct 31, 2024
a5db49a
feat: add more information to database heathcheck
manuelmauro Oct 31, 2024
b57027c
style: format code
manuelmauro Oct 31, 2024
5d23bd6
chore: prepare sqlx queries
manuelmauro Oct 31, 2024
959542b
fix: remove outdated query file
manuelmauro Oct 31, 2024
28ecc96
feat: improve bytes encoding in healthcheck
manuelmauro Oct 31, 2024
3d7e82c
Merge branch 'main' into manuel-add-more-components-to-healthcheck
manuelmauro Oct 31, 2024
de3a5e1
feat: add dummy health check tasks for state keeper and eth sender
manuelmauro Oct 31, 2024
582754b
fix: do not unwrap
manuelmauro Oct 31, 2024
1b8e50e
feat: retrieve failed L1 transactions and next operator nonce
manuelmauro Oct 31, 2024
68210aa
feat: add information on last saved/mined batches to healthcheck
manuelmauro Nov 4, 2024
2fc3da9
feat: get last miniblock number from DB
manuelmauro Nov 4, 2024
098e54f
feat: add protocol version information to healthcheck
manuelmauro Nov 4, 2024
4cbb890
feat: add last processed L1 batch to health check
manuelmauro Nov 4, 2024
f60e3a9
refactor: use SELECT MAX instead of ORDER BY
manuelmauro Nov 4, 2024
927cddb
refactor: rename LastBatchIndex to BatchNumbers
manuelmauro Nov 4, 2024
80db2ce
fix: revert code committed by mistake
manuelmauro Nov 6, 2024
2c9ca8d
feat: add config parameters for healthcheck polling intervals
manuelmauro Nov 6, 2024
98f8d72
fix: fix house keeper config from env test
manuelmauro Nov 6, 2024
988b957
fix: fix house keeper config parameters naming in unit test
manuelmauro Nov 6, 2024
293484d
fix: use u64 for failed_l1_txns
manuelmauro Nov 7, 2024
3ad3fd2
fix: return u64 in get_number_of_failed_transactions
manuelmauro Nov 7, 2024
b31233b
feat: use connection_tagged for better code instumentation
manuelmauro Nov 7, 2024
5396e5a
feat: add reactive health check to state keeper
manuelmauro Nov 7, 2024
f51785c
fix: update state keeper health at the right moment
manuelmauro Nov 7, 2024
bb5acfc
refactor: use ORDER BY to query last database migration
manuelmauro Nov 7, 2024
fd35543
feat: add reactive health check to eth sender
manuelmauro Nov 8, 2024
416ea50
style: clippy
manuelmauro Nov 8, 2024
d0ac511
Merge branch 'main' into manuel-add-more-components-to-healthcheck
manuelmauro Nov 8, 2024
8d4c176
style: move field before reserved ones
manuelmauro Nov 13, 2024
eb2fb68
refactor: rename PostgresMetricsLayer to PostgresLayer
manuelmauro Nov 13, 2024
80986e4
refactor: rename postgres_metrics_layer to postgres_layer
manuelmauro Nov 13, 2024
9bffd99
refactor: rename module postgres_layer to postgres
manuelmauro Nov 13, 2024
5e235b1
refactor: move database health check task to postgres layer
manuelmauro Nov 13, 2024
884b864
feat: implement Serialize and Deserialize directly on AggregatedActio…
manuelmauro Nov 13, 2024
c701399
refactor: make DatabaseHealthTask fields private
manuelmauro Nov 13, 2024
53cca2f
refactor: remove redundant health status updates
manuelmauro Nov 13, 2024
fe339c2
Merge branch 'main' into manuel-add-more-components-to-healthcheck
manuelmauro Nov 13, 2024
9fe98d2
feat: make health mod private
manuelmauro Nov 13, 2024
36ad7c5
refactor: remove StateKeeperTask constructor
manuelmauro Nov 13, 2024
1585d5d
refactor: use getter for health updater
manuelmauro Nov 13, 2024
f9e2ebf
refactor: clippy
manuelmauro Nov 14, 2024
26aa824
feat: add git information to RustcMetadata
manuelmauro Nov 14, 2024
f06a415
refactor: rename rustc module to binary
manuelmauro Nov 14, 2024
bdc50f4
Merge branch 'main' into manuel-add-more-components-to-healthcheck
manuelmauro Nov 14, 2024
39f3864
Merge branch 'main' into manuel-add-more-components-to-healthcheck
manuelmauro Nov 15, 2024
195aee4
refactor: remove redundant health status update
manuelmauro Nov 15, 2024
4e14447
refactor: remove redundant health check update
manuelmauro Nov 15, 2024
2acadca
fix: remove unused dependencies
manuelmauro Nov 15, 2024
d5d78fc
revert: revert formatting changes
manuelmauro Nov 15, 2024
f1d618c
Merge branch 'main' into manuel-add-more-components-to-healthcheck
Deniallugo Nov 15, 2024
ec9ae20
Merge branch 'main' into manuel-add-more-components-to-healthcheck
manuelmauro Nov 15, 2024
416c187
Merge branch 'main' into manuel-add-more-components-to-healthcheck
manuelmauro Nov 18, 2024
9964ee7
feat: use Duration for migrations' execution_time
manuelmauro Nov 18, 2024
19275b6
refactor: use same interval for Postgres metrics exporter and healthc…
manuelmauro Nov 18, 2024
8911cc2
refactor: do not split use and mod declarations
manuelmauro Nov 18, 2024
6bc90e2
refactor: use Option type instead of "unknown"
manuelmauro Nov 18, 2024
d2c983c
feat: update state keeper health from cursor right away
manuelmauro Nov 18, 2024
25d8b26
refactor: merge tx status into tx details
manuelmauro Nov 18, 2024
d7d62ec
refactor: clippy
manuelmauro Nov 18, 2024
a994cc6
Merge branch 'main' into manuel-add-more-components-to-healthcheck
manuelmauro Nov 18, 2024
93b3153
Merge branch 'main' into manuel-add-more-components-to-healthcheck
manuelmauro Nov 19, 2024
16d1358
feat: integrate binary metadata into AppHealth
manuelmauro Nov 19, 2024
b6bdda4
feat: split git metrics from rust metrics
manuelmauro Nov 21, 2024
785fe5b
style: format code
manuelmauro Nov 21, 2024
e30ebac
refactor: nit BinMetadata creation
manuelmauro Nov 21, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions core/bin/external_node/src/metrics/framework.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ use zksync_node_framework::{
implementations::resources::pools::{MasterPool, PoolResource},
FromContext, IntoContext, StopReceiver, Task, TaskId, WiringError, WiringLayer,
};
use zksync_shared_metrics::rustc::RUST_METRICS;
use zksync_shared_metrics::binary::BIN_METRICS;
use zksync_types::{L1ChainId, L2ChainId, SLChainId};

use super::EN_METRICS;
Expand Down Expand Up @@ -39,7 +39,7 @@ impl WiringLayer for ExternalNodeMetricsLayer {
}

async fn wire(self, input: Self::Input) -> Result<Self::Output, WiringError> {
RUST_METRICS.initialize();
BIN_METRICS.initialize();
EN_METRICS.observe_config(
self.l1_chain_id,
self.sl_chain_id,
Expand Down
8 changes: 4 additions & 4 deletions core/bin/external_node/src/node_builder.rs
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ use zksync_node_framework::{
NodeStorageInitializerLayer,
},
pools_layer::PoolsLayerBuilder,
postgres_metrics::PostgresMetricsLayer,
postgres::PostgresLayer,
prometheus_exporter::PrometheusExporterLayer,
pruning::PruningLayer,
query_eth_client::QueryEthClientLayer,
Expand Down Expand Up @@ -125,8 +125,8 @@ impl ExternalNodeBuilder {
Ok(self)
}

fn add_postgres_metrics_layer(mut self) -> anyhow::Result<Self> {
self.node.add_layer(PostgresMetricsLayer);
fn add_postgres_layer(mut self) -> anyhow::Result<Self> {
self.node.add_layer(PostgresLayer);
Ok(self)
}

Expand Down Expand Up @@ -582,7 +582,7 @@ impl ExternalNodeBuilder {
// so until we have a dedicated component for "auxiliary" tasks,
// it's responsible for things like metrics.
self = self
.add_postgres_metrics_layer()?
.add_postgres_layer()?
.add_external_node_metrics_layer()?;
// We assign the storage initialization to the core, as it's considered to be
// the "main" component.
Expand Down
10 changes: 4 additions & 6 deletions core/bin/zksync_server/src/node_builder.rs
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ use zksync_node_framework::{
object_store::ObjectStoreLayer,
pk_signing_eth_client::PKSigningEthClientLayer,
pools_layer::PoolsLayerBuilder,
postgres_metrics::PostgresMetricsLayer,
postgres::PostgresLayer,
prometheus_exporter::PrometheusExporterLayer,
proof_data_handler::ProofDataHandlerLayer,
query_eth_client::QueryEthClientLayer,
Expand Down Expand Up @@ -138,8 +138,8 @@ impl MainNodeBuilder {
Ok(self)
}

fn add_postgres_metrics_layer(mut self) -> anyhow::Result<Self> {
self.node.add_layer(PostgresMetricsLayer);
fn add_postgres_layer(mut self) -> anyhow::Result<Self> {
self.node.add_layer(PostgresLayer);
Ok(self)
}

Expand Down Expand Up @@ -760,9 +760,7 @@ impl MainNodeBuilder {
self = self.add_eth_tx_manager_layer()?;
}
Component::Housekeeper => {
self = self
.add_house_keeper_layer()?
.add_postgres_metrics_layer()?;
self = self.add_house_keeper_layer()?.add_postgres_layer()?;
}
Component::ProofDataHandler => {
self = self.add_proof_data_handler_layer()?;
Expand Down

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion core/lib/dal/src/eth_sender_dal.rs
Original file line number Diff line number Diff line change
Expand Up @@ -669,7 +669,7 @@ impl EthSenderDal<'_, '_> {
Ok(())
}

pub async fn get_number_of_failed_transactions(&mut self) -> anyhow::Result<i64> {
pub async fn get_number_of_failed_transactions(&mut self) -> anyhow::Result<u64> {
sqlx::query!(
r#"
SELECT
Expand All @@ -683,6 +683,7 @@ impl EthSenderDal<'_, '_> {
.fetch_one(self.storage.conn())
.await?
.count
.map(|c| c as u64)
.context("count field is missing")
}

Expand Down
35 changes: 35 additions & 0 deletions core/lib/dal/src/system_dal.rs
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
use std::{collections::HashMap, time::Duration};

use chrono::DateTime;
use serde::{Deserialize, Serialize};
use zksync_db_connection::{connection::Connection, error::DalResult, instrument::InstrumentExt};

use crate::Core;
Expand All @@ -12,6 +14,16 @@ pub(crate) struct TableSize {
pub total_size: u64,
}

#[derive(Debug, Serialize, Deserialize)]
pub struct DatabaseMigration {
pub version: i64,
pub description: String,
pub installed_on: DateTime<chrono::Utc>,
pub success: bool,
pub checksum: String,
pub execution_time: Duration,
}

#[derive(Debug)]
pub struct SystemDal<'a, 'c> {
pub(crate) storage: &'a mut Connection<'c, Core>,
Expand Down Expand Up @@ -86,4 +98,27 @@ impl SystemDal<'_, '_> {
});
Ok(table_sizes.collect())
}

pub async fn get_last_migration(&mut self) -> DalResult<DatabaseMigration> {
let row = sqlx::query!(
r#"
SELECT *
FROM _sqlx_migrations
ORDER BY _sqlx_migrations.version DESC
LIMIT 1
"#
)
.instrument("get_last_migration")
.fetch_one(self.storage)
.await?;

Ok(DatabaseMigration {
version: row.version,
description: row.description,
installed_on: row.installed_on,
success: row.success,
checksum: hex::encode(row.checksum),
execution_time: Duration::from_millis(u64::try_from(row.execution_time).unwrap_or(0)),
})
}
}
4 changes: 3 additions & 1 deletion core/lib/types/src/aggregated_operations.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
use std::{fmt, str::FromStr};

#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
use serde::{Deserialize, Serialize};

#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, Serialize, Deserialize)]
pub enum AggregatedActionType {
Commit,
PublishProofOnchain,
Expand Down
6 changes: 2 additions & 4 deletions core/node/consensus/src/testonly.rs
Original file line number Diff line number Diff line change
Expand Up @@ -584,15 +584,14 @@ impl StateKeeperRunner {
let stop_recv = stop_recv.clone();
async {
ZkSyncStateKeeper::new(
stop_recv,
Box::new(io),
Box::new(executor_factory),
OutputHandler::new(Box::new(persistence.with_tx_insertion()))
.with_handler(Box::new(self.sync_state.clone())),
Arc::new(NoopSealer),
Arc::new(async_cache),
)
.run()
.run(stop_recv)
.await
.context("ZkSyncStateKeeper::run()")?;
Ok(())
Expand Down Expand Up @@ -665,7 +664,6 @@ impl StateKeeperRunner {
let stop_recv = stop_recv.clone();
async {
ZkSyncStateKeeper::new(
stop_recv,
Box::new(io),
Box::new(MockBatchExecutor),
OutputHandler::new(Box::new(persistence.with_tx_insertion()))
Expand All @@ -674,7 +672,7 @@ impl StateKeeperRunner {
Arc::new(NoopSealer),
Arc::new(MockReadStorageFactory),
)
.run()
.run(stop_recv)
.await
.context("ZkSyncStateKeeper::run()")?;
Ok(())
Expand Down
2 changes: 2 additions & 0 deletions core/node/eth_sender/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,14 @@ keywords.workspace = true
categories.workspace = true

[dependencies]
serde.workspace = true
vise.workspace = true
zksync_types.workspace = true
zksync_dal.workspace = true
zksync_config.workspace = true
zksync_contracts.workspace = true
zksync_eth_client.workspace = true
zksync_health_check.workspace = true
zksync_l1_contract_interface.workspace = true
zksync_object_store.workspace = true
zksync_prover_interface.workspace = true
Expand Down
19 changes: 19 additions & 0 deletions core/node/eth_sender/src/eth_tx_aggregator.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ use zksync_config::configs::eth_sender::SenderConfig;
use zksync_contracts::BaseSystemContractsHashes;
use zksync_dal::{Connection, ConnectionPool, Core, CoreDal};
use zksync_eth_client::{BoundEthInterface, CallFunctionArgs};
use zksync_health_check::{Health, HealthStatus, HealthUpdater, ReactiveHealthCheck};
use zksync_l1_contract_interface::{
i_executor::{
commit::kzg::{KzgInfo, ZK_SYNC_BYTES_PER_BLOB},
Expand All @@ -27,6 +28,7 @@ use zksync_types::{

use super::aggregated_operations::AggregatedOperation;
use crate::{
health::{EthTxAggregatorHealthDetails, EthTxDetails},
metrics::{PubdataKind, METRICS},
utils::agg_l1_batch_base_cost,
zksync_functions::ZkSyncFunctions,
Expand Down Expand Up @@ -65,6 +67,7 @@ pub struct EthTxAggregator {
pool: ConnectionPool<Core>,
settlement_mode: SettlementMode,
sl_chain_id: SLChainId,
health_updater: HealthUpdater,
}

struct TxData {
Expand Down Expand Up @@ -119,10 +122,14 @@ impl EthTxAggregator {
pool,
settlement_mode,
sl_chain_id,
health_updater: ReactiveHealthCheck::new("eth_tx_aggregator").1,
}
}

pub async fn run(mut self, stop_receiver: watch::Receiver<bool>) -> anyhow::Result<()> {
self.health_updater
.update(Health::from(HealthStatus::Ready));

let pool = self.pool.clone();
loop {
let mut storage = pool.connection_tagged("eth_sender").await.unwrap();
Expand Down Expand Up @@ -431,6 +438,13 @@ impl EthTxAggregator {
)
.await?;
Self::report_eth_tx_saving(storage, &agg_op, &tx).await;

self.health_updater.update(
EthTxAggregatorHealthDetails {
last_saved_tx: EthTxDetails::new(&tx, None),
}
.into(),
);
}
Ok(())
}
Expand Down Expand Up @@ -670,4 +684,9 @@ impl EthTxAggregator {
)
})
}

/// Returns the health check for eth tx aggregator.
pub fn health_check(&self) -> ReactiveHealthCheck {
self.health_updater.subscribe()
}
}
Loading
Loading