Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make vault fall back to other overlay node if overlay rejects connection #568

Open
ebma opened this issue Nov 7, 2024 · 1 comment
Open

Comments

@ebma
Copy link
Member

ebma commented Nov 7, 2024

Context

It can happen that a validator node reaches its limit of open connections. If that happens, the node returns Error{ code:ErrLoad message:peer rejected }. While we already have a fallback to other overlays in place when the initial connection fails, in this particular case the validator will throw an error only later in the flow so the default fallback does not work. As can be seen in the following logs, the connection is first accepted but in a later message, the peer is rejected.

Nov 07 09:22:22.182  INFO service: Version: 1.0.13
Nov 07 09:22:22.182  INFO service: Vault uses Substrate account with ID: 6jvAnyNWmtZSF5efLSxgx9awf4nn44r2Pbwuu8L3Ru5B77aN
Nov 07 09:22:22.182  INFO runtime::conn: Connecting to the spacewalk-parachain...    
Nov 07 09:22:22.320  INFO runtime::conn: Connected!    
Nov 07 09:22:22.584  INFO runtime::rpc: spec_name="amplitude"    
Nov 07 09:22:22.584  INFO runtime::rpc: spec_version=19    
Nov 07 09:22:22.584  INFO runtime::rpc: transaction_version=13    
Nov 07 09:22:22.587  INFO wallet::cache: Caching stellar transactions at .//GDI3XRT3OXFIUNGKEOC5273DUM6BJUBKBJ4BLTIQU7M3HYGDXTYH5SHY_true/txs
Nov 07 09:22:24.402  INFO vault::system: Got new block at height 4702605
Nov 07 09:22:24.402  INFO vault::system: Starting client service...
Nov 07 09:22:24.458  INFO vault::system: Not registering public key -- already registered
Nov 07 09:22:24.519  INFO vault::system: [6jvAnyNWmtZSF5efLSxgx9awf4nn44r2Pbwuu8L3Ru5B77aN[XCM(0)->{ code: AUDD, issuer: GDC7X2MXTYSAKUUGAIQ7J7RPEIM7GXSAIWFYWWH4GLNFECQVJJLB2EEU }]] Not registering vault -- already registered
Nov 07 09:22:24.716  INFO vault::system: Adding vault with ID: VaultId { account_id: AccountId32([134, 170, 122, 75, 136, 133, 147, 64, 79, 224, 247, 246, 119, 142, 219, 181, 233, 52, 167, 28, 46, 70, 222, 230, 11, 186, 32, 110, 34, 29, 213, 122]), currencies: VaultCurrencyPair { collateral: Static(XCM(0)), wrapped: Static({ code: AUDD, issuer: GDC7X2MXTYSAKUUGAIQ7J7RPEIM7GXSAIWFYWWH4GLNFECQVJJLB2EEU }) } }
Nov 07 09:22:24.719  INFO stellar_relay_lib::config: connection_info(): Connecting to Stellar overlay network using public key: GDI3XRT3OXFIUNGKEOC5273DUM6BJUBKBJ4BLTIQU7M3HYGDXTYH5SHY
Nov 07 09:22:24.723  INFO stellar_relay_lib::overlay: connect(): connecting to ConnectionInfo { address: "85.190.254.217", port: 11625, secret_key: "****", auth_cert_expiration: 0, receive_tx_messages: false, receive_scp_messages: true, remote_called_us: false, timeout_in_seconds: 10 }
Nov 07 09:22:24.837  INFO stellar_relay_lib::connection::connector::message_reader: poll_messages_from_stellar(): started.
Nov 07 09:22:24.945  INFO stellar_relay_lib::connection::connector::message_handler: process_stellar_message(): Hello message processed successfully
Nov 07 09:22:25.048 ERROR stellar_relay_lib::connection::connector::message_handler: process_raw_message(): Received ErrorMsg during authentication: Error{ code:ErrLoad message:peer rejected }
Nov 07 09:22:25.049 ERROR stellar_relay_lib::connection::error: Stellar Node returned error: Error{ code:ErrLoad message:peer rejected }
Nov 07 09:22:25.049 ERROR stellar_relay_lib::connection::connector::message_reader: poll_messages_from_stellar(): Error occurred during processing xdr message: OverlayError(ErrLoad)
Nov 07 09:22:25.049  INFO stellar_relay_lib::connection::connector::message_reader: poll_messages_from_stellar(): stopped.
Nov 07 09:22:35.278  INFO vault::system: Starting all services...
Nov 07 09:22:35.278  INFO vault::requests::execution: execute_open_requests(): started
Nov 07 09:22:35.278  INFO vault::oracle::agent: listen_for_stellar_messages(): started
Nov 07 09:22:35.278  INFO stellar_relay_lib::overlay: stop(): closing connection to overlay network
Nov 07 09:22:35.278 ERROR stellar_relay_lib::overlay: listen(): sender half of overlay has closed.
Nov 07 09:22:35.278 ERROR vault::oracle::agent: listen_for_stellar_messages(): encounter error in overlay: Disconnected
Nov 07 09:22:35.278  INFO vault::oracle::agent: listen_for_stellar_messages(): shutting down overlay connection

The error seems to be thrown here and there is a similar check here

TODO

Make the vault fall back to a different Stellar validator node when this error is encountered. A simple solution to this would be to make the vault connect to a different validator node every time it restarts, regardless of the reason for the restart.

@ebma
Copy link
Member Author

ebma commented Nov 7, 2024

@pendulum-chain/product we recently encountered this issue for the first time. This can be fixed by restarting the validator node but this is not the ideal solution. This ticket improves the handling and vaults can recover gracefully. I would still consider it of rather low priority.

@pendulum-chain pendulum-chain deleted a comment from ethsdev Dec 3, 2024
@pendulum-chain pendulum-chain deleted a comment from ethsdev Dec 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant