You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
2024-12-26T12:00:15.068Z INFO [rmb::events] started chain events listener
439x
2024-12-26T12:00:13.594Z ERROR [rmb::twin::substrate] failed to create substrate client with url "ws://tfchain-public-node:9944": Rpc error: RPC error: Networking or low-level protocol error: Error when opening the TCP socket: Connection refused (os error 111)
2024-12-26T12:00:13.516Z ERROR [rmb::twin::substrate] failed to create substrate client with url "ws://tfchain-public-node:9944": Rpc error: RPC error: Networking or low-level protocol error: Connection timeout exceeded: 10s
2024-12-26T12:00:05.394Z ERROR [rmb::events] error listening to events Rpc error: RPC error: The background task been terminated because: Networking or low-level protocol error: WebSocket connection error: connection closed; restart required
2024-12-26T12:00:05.394Z INFO [rmb::events] started chain events listener
8x
2024-12-26T12:00:03.467Z ERROR [rmb::twin::substrate] failed to create substrate client with url "ws://tfchain-public-node:9944": Rpc error: RPC error: Networking or low-level protocol error: Error when opening the TCP socket: Connection refused (os error 111)
failed to connect to substrate using the provided urls. Retrying in 2s...
Caused by:
2024-12-26T12:00:03.393Z WARN [rmb_relay] Listener failed on attempt 2: failed to listen to chain events
2x
2024-12-26T12:00:03.393Z ERROR [rmb::events] failed to create substrate client with url "ws://tfchain-public-node:9944": Rpc error: RPC error: Networking or low-level protocol error: Error when opening the TCP socket: Connection refused (os error 111)
2024-12-26T12:00:03.393Z ERROR [rmb::events] error listening to events Rpc error: RPC error: The background task been terminated because: Networking or low-level protocol error: WebSocket connection error: connection closed; restart required
2024-12-26T12:00:03.393Z INFO [rmb::events] started chain events listener
542x
2024-12-26T12:00:03.233Z ERROR [rmb::twin::substrate] failed to create substrate client with url "ws://tfchain-public-node:9944": Rpc error: RPC error: Networking or low-level protocol error: Error when opening the TCP socket: Connection refused (os error 111)
From the logs, it seems the relay is retrying too quickly about 500 times per second.
More importantly, we need to investigate whether relays can end up in a state where they are not fully started or unable to handle WS messages, yet still accept connections.
tokio::spawn(asyncmove{let max_retries = 9;// max wait is 2^9 = 512 seconds ( 5 minutes )letmut attempt = 0;letmut backoff = Duration::from_secs(1);letmut got_hit = false;loop{match l
.listen(&mut got_hit).await.context("failed to listen to chain events"){Ok(_) => break,Err(e) => {if got_hit {
log::warn!("Listener got a hit, but failed to listen to chain events before no attempts will be reset");
got_hit = false;
attempt = 0;
backoff = Duration::from_secs(1);}
attempt += 1;if attempt > max_retries {
log::error!("Listener failed after {} attempts: {:?}", attempt - 1, e);let _ = tx.send(());break;}
log::warn!("Listener failed on attempt {}: {:?}. Retrying in {:?}...",
attempt,
e,
backoff
);
tokio::time::sleep(backoff).await;
backoff *= 2;}}}});
r.start(&args.listen).await.unwrap();
I believe the issue here is that the relay starts accepting messages even if it can't connect to the RPC node. After 5 minutes of receiving messages without successfully querying twin relays, it exits and is likely restarted by compose restart policy. This cycle continues, trapping messages and confusing the nodes, keeping accepting messages leading them to believe the messages will eventually be delivered.
The text was updated successfully, but these errors were encountered:
From the logs, it seems the relay is retrying too quickly about 500 times per second.
More importantly, we need to investigate whether relays can end up in a state where they are not fully started or unable to handle WS messages, yet still accept connections.
I believe the issue here is that the relay starts accepting messages even if it can't connect to the RPC node. After 5 minutes of receiving messages without successfully querying twin relays, it exits and is likely restarted by compose restart policy. This cycle continues, trapping messages and confusing the nodes, keeping accepting messages leading them to believe the messages will eventually be delivered.
The text was updated successfully, but these errors were encountered: