Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: handle connection to multiple upstream peers #85

Merged
merged 2 commits into from
Feb 5, 2025

Conversation

abailly
Copy link
Contributor

@abailly abailly commented Feb 2, 2025

This small PR adds the ability for an Amaru node to connect to multiple upstream peers. This is a temporary solution to test and demo the chain selection process, while we wait for @scarmuega's P2P handling component.

Note: This PR sits on top of #75

Summary by CodeRabbit

  • New Features

    • You can now configure multiple upstream peer addresses, enhancing connectivity and redundancy.
    • Consensus mechanisms have been improved by incorporating peer-specific details during chain selection and fork handling.
    • Updated documentation to clarify the ability to add multiple peers when configuring node connections.
  • Bug Fixes

    • Improved error handling in header validation to provide warnings instead of panicking.
  • Refactor

    • Peer session management has been updated to support multiple sessions concurrently, improving robustness.
    • Logging has been enhanced with additional context, aiding in traceability during operational processes.
    • The Peer struct now exposes the name field publicly, allowing for external access.

Copy link

coderabbitai bot commented Feb 2, 2025

Walkthrough

The updates modify the handling of peer connections across several modules. The command-line arguments and configuration structures now support multiple peer addresses by changing fields from a single string to a vector of strings. The bootstrapping process and consensus chain functions have been adapted accordingly to handle multiple peer clients and sessions. Instrumentation improvements enhance logging by adding peer context. Additionally, the public API of the peer structure has been updated for easier external access.

Changes

File(s) Change Summary
crates/amaru/src/bin/amaru/cmd/daemon.rs Updated CLI parsing: changed peer_address: StringVec<String> with ArgAction::Append and marked as required; modified the run function to connect to multiple peers and wrap connections in Arc<Mutex<>>.
crates/amaru/src/sync/mod.rs Updated Config: changed upstream_peer: Stringupstream_peers: Vec<String>; revised bootstrap to accept a vector of tuples for multiple clients; adjusted pull stage initialisation and chain selector to support multiple sessions.
crates/amaru/src/sync/pull.rs Enhanced logging in find_intersection by adding a peer field alongside the intersection data in the instrumentation attributes.
crates/consensus/src/consensus/chain_selection.rs Modified the SwitchToFork enum variant to include a Peer parameter; updated roll_forward and rollback to pass the peer context when switching to a fork.
crates/consensus/src/consensus/mod.rs Replaced single peer_session with peer_sessions: HashMap<Peer, PeerSession>; altered method signatures (e.g. forward_block, switch_to_fork, etc.) to include a peer parameter, ensuring proper handling of multiple peer sessions.
crates/ouroboros/src/protocol/peer.rs Adjusted the Peer struct by making the name field public, thus allowing external modules to access the peer’s name.
README.md Updated instructions to clarify that users can add multiple peers when running the node.

Sequence Diagram(s)

sequenceDiagram
    participant CLI
    participant Daemon
    participant PeerClientFactory
    participant Bootstrap
    participant PeerSessionManager

    CLI->>Daemon: Provides multiple peer addresses
    Daemon->>PeerClientFactory: Create a PeerClient for each address
    PeerClientFactory-->>Daemon: Return multiple PeerClient instances (wrapped in Arc/Mutex)
    Daemon->>Bootstrap: Pass vector of clients
    Bootstrap->>PeerSessionManager: Create PeerSessions for each client
    PeerSessionManager-->>Bootstrap: Return constructed PeerSessions
Loading
sequenceDiagram
    participant Consensus
    participant StageManager
    participant PeerSessionMap

    Consensus->>StageManager: Invoke roll_forward/rollback with peer info
    StageManager->>PeerSessionMap: Retrieve PeerSession using peer key
    PeerSessionMap-->>StageManager: Return matching PeerSession
    StageManager->>Consensus: Process block forwarding or fork switching with peer session
Loading

Possibly related PRs

  • chore: extract stores and ledger as separate crates #68: The changes in the main PR are related to the modifications in the Config struct and the bootstrap function in the retrieved PR, as both involve updating the handling of multiple peer addresses and their corresponding connections.
  • Naive block selection #75: The changes in the main PR are related to those in the retrieved PR as both involve modifications to the handling of multiple peer addresses, specifically updating the Args struct and the bootstrap function to accommodate vectors of peer clients.
  • Another stab at observability #84: The changes in the main PR, which involve modifying the Args struct and the run function to handle multiple peer addresses, are related to the changes in the retrieved PR that also update the run function and parse_args function in the same file, focusing on metrics handling. Both PRs modify the same functions, indicating a direct connection at the code level.

Suggested reviewers

  • KtorZ: A knowledgeable reviewer who can provide insights into the changes made.

Poem

In the realm of code so fine,
Multiple peers now intertwine.
From single strings to vectors wide,
Our system’s given a merry ride.
With logging sharp like a well-tuned bell,
Each peer’s tale they now do tell –
Cheers to change, and all is well!

Tip

🌐 Web search-backed reviews and chat
  • We have enabled web search-based reviews and chat for all users. This feature allows CodeRabbit to access the latest documentation and information on the web.
  • You can disable this feature by setting web_search: false in the knowledge_base settings.
  • Please share any feedback in the Discord discussion.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@abailly abailly force-pushed the abailly/multiple-upstream-peers branch from 31d0a9d to ad49d82 Compare February 2, 2025 10:38
@abailly abailly force-pushed the naive_block_selection branch from 495ed4b to 7fabb2b Compare February 3, 2025 16:51
@abailly abailly force-pushed the abailly/multiple-upstream-peers branch from ad49d82 to 2f0e09f Compare February 3, 2025 17:07
Base automatically changed from naive_block_selection to main February 3, 2025 21:10
@abailly
Copy link
Contributor Author

abailly commented Feb 3, 2025

@coderabbitai review

Copy link

coderabbitai bot commented Feb 3, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
crates/amaru/src/bin/amaru/cmd/daemon.rs (1)

68-74: Add graceful error handling for peer connections.

While the implementation is solid, it might be a bit of a sticky wicket if one peer connection fails. Consider implementing a more resilient approach that continues with successfully connected peers even if some fail.

 let mut clients: Vec<(String, Arc<Mutex<PeerClient>>)> = vec![];
 for peer in &config.upstream_peers {
-    let client = PeerClient::connect(peer.clone(), config.network_magic as u64)
-        .await
-        .into_diagnostic()?;
-    clients.push((peer.clone(), Arc::new(Mutex::new(client))));
+    match PeerClient::connect(peer.clone(), config.network_magic as u64).await {
+        Ok(client) => {
+            clients.push((peer.clone(), Arc::new(Mutex::new(client))));
+        }
+        Err(e) => {
+            error!("Failed to connect to peer {}: {}", peer, e);
+            continue;
+        }
+    }
 }
+if clients.is_empty() {
+    return Err(miette!("Failed to connect to any peers")).into_diagnostic();
+}
crates/consensus/src/consensus/chain_selection.rs (2)

345-345: Tests updated correctly, but could use more coverage.

The test updates for SwitchToFork are spot on, but we might want to add a few more test cases to cover different peer scenarios.

Consider adding tests for:

  1. Multiple peers providing the same fork
  2. Multiple peers providing different forks of the same length
  3. A peer disconnecting during fork switch

Also applies to: 449-449


81-81: Consider performance optimizations for peer chain tracking.

The HashMap for peer chains is a good choice, but for systems with many peers, we might want to consider some optimizations.

Suggestions:

  1. Consider using a capacity hint when creating the HashMap if we know the expected number of peers
  2. We could implement a simple LRU cache for frequently accessed peer chains
  3. Consider adding metrics for peer chain lengths to help with monitoring and tuning
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 737c86f and 2f0e09f.

📒 Files selected for processing (6)
  • crates/amaru/src/bin/amaru/cmd/daemon.rs (4 hunks)
  • crates/amaru/src/sync/mod.rs (3 hunks)
  • crates/amaru/src/sync/pull.rs (1 hunks)
  • crates/consensus/src/consensus/chain_selection.rs (5 hunks)
  • crates/consensus/src/consensus/mod.rs (7 hunks)
  • crates/ouroboros/src/protocol/peer.rs (1 hunks)
🔇 Additional comments (25)
crates/amaru/src/sync/mod.rs (12)

24-25: Neat usage of funnel_ports
These imports seem fitting for channelling multiple ports into a single flow. No immediate concerns here, mate.


45-45: Smashing approach for multiple upstream peers
Switching from a single string to a Vec<String> paves the way for multi-peer connectivity. Marvellous job.


69-72: Great extension to manage multiple clients
Passing the entire collection is sensible. Just be mindful if the number of peers becomes massive, as that could get resource-heavy—for now, though, this is top-notch.


78-84: Building peer sessions
Perfectly logical construction of sessions. One small heads-up: if peer_name collisions are possible, it might overwrite or cause confusion. You may consider verifying uniqueness.


86-89: Multiple pull stages look right
This code elegantly loops over each session. No major quibbles—looks bonzer.


91-91: Chain Selector usage
Brilliant step passing all peer sessions to make_chain_selector. All seems hunky-dory.


93-93: Consensus stage with all sessions
Nicely done. Providing the entire set of sessions ensures each peer is accounted for.


102-106: Funnelling multiple outputs
This is a tidy approach to gather data from all pulls. Keep an eye on the backlog if you end up with heaps of peers and messages—50 might need revisiting.


112-115: Spawning multiple pulls
A clean approach to launching each stage with the same policy. Just ensure concurrency is well-handled if peer counts balloon.


120-121: Adding extra stages
Neatly popping header_validation and ledger into the pulls vector. Good on you.


123-123: Returning the aggregated stages
Laughably straightforward, but no issues. Nicely ensures a single handle to all threads.


129-129: Pointer to peer sessions
Accepting a reference to multiple PeerSessions is grand. Nothing to fuss over.

crates/consensus/src/consensus/mod.rs (7)

44-44: HashMap for peer sessions
Swapping from a single session to a HashMap is brilliant for multi-peer scenarios. Cracking update.


65-74: Building a HashMap from the vector
This approach is quite dandy. Just be aware that any duplicated peer names might override each other. If that’s not desired, it might need extra logic.


93-93: Expanded forward_block signature
Spot on including the peer as a parameter. This method is neat and tidy—no complaints from me.


114-125: Fork switching logic
Introducing the peer parameter here ensures forward_block gets the correct session. Cracking design—makes the switch to forklift easy-peasy.


165-165: Roll forward with consistent usage
Good job using forward_block again. Shows thorough consistency in the codebase.


173-175: SwitchToFork scenario
Top marks for managing the peer-specific fork scenario. Good clear flow without any tangles.


202-204: Rollback SwitchToFork
A symmetrical approach to handle rollbacks with forks. This is tidy as a teapot, no drama spotted.

crates/ouroboros/src/protocol/peer.rs (1)

23-23: Public peer name
Exposing name is a pragmatic step, especially for identification among multiple peers. Just take care if you need to shield internal details in future.

crates/amaru/src/bin/amaru/cmd/daemon.rs (2)

30-35: Brilliant work on the CLI argument handling!

The documentation and configuration for multiple peer addresses is spot-on. Making it required with append action ensures users must specify at least one peer, while allowing multiple peers to be added. Proper tidy work!


116-116: LGTM! Clean and straightforward.

The config update is spot on, mate! Simple and effective.

crates/amaru/src/sync/pull.rs (1)

69-69: Top-notch observability improvement!

Adding the peer context to the logs is a cracking idea! It'll be dead useful when debugging issues with multiple peers.

crates/consensus/src/consensus/chain_selection.rs (2)

97-102: Good catch with the FIXME comment!

The temporary nature of needing the peer in SwitchToFork is well documented. However, let's not let this slip through the cracks.

Would you like me to create an issue to track this technical debt? The issue would focus on refactoring the fork switching mechanism to work with stored blocks, eliminating the peer dependency.


180-182: Lovely logging enhancement!

The addition of peer context to the logs is consistent with the observability improvements throughout the codebase. Good stuff!

crates/consensus/src/consensus/mod.rs Show resolved Hide resolved
@KtorZ KtorZ force-pushed the abailly/multiple-upstream-peers branch from 2f0e09f to ff16cfc Compare February 4, 2025 13:43
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
crates/consensus/src/consensus/mod.rs (1)

93-104: ⚠️ Potential issue

Replace expect with proper error handling

G'day! We're still using that risky expect call for peer lookup. Let's handle it more gracefully, eh?

-            let peer_session = self
-                .peer_sessions
-                .get(peer)
-                .expect("Unknown peer, bailing out");
+            let peer_session = self
+                .peer_sessions
+                .get(peer)
+                .ok_or_else(|| WorkerError::from(miette!("Unknown peer: {}", peer)))?;
🧹 Nitpick comments (2)
crates/amaru/src/sync/mod.rs (2)

70-73: Add error handling for client initialization

Crikey! While the function signature change looks good, we might want to handle potential client initialization failures gracefully.

Consider adding error handling:

 pub fn bootstrap(
     config: Config,
     clients: Vec<(String, Arc<Mutex<PeerClient>>)>,
-) -> miette::Result<Vec<Tether>> {
+) -> miette::Result<Vec<Tether>> {
+    if clients.is_empty() {
+        return Err(miette::miette!("No upstream peers provided"));
+    }

79-85: Add logging for peer session initialization

A bit of logging would be a proper treat here, helping us track which peers we're connecting to.

+    tracing::info!("Initializing {} peer sessions", clients.len());
     let peer_sessions: Vec<PeerSession> = clients
         .iter()
         .map(|(peer_name, client)| {
+            tracing::debug!("Creating session for peer: {}", peer_name);
             PeerSession {
                 peer: Peer::new(peer_name),
                 peer_client: client.clone(),
             }
         })
         .collect();
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2f0e09f and ff16cfc.

📒 Files selected for processing (6)
  • crates/amaru/src/bin/amaru/cmd/daemon.rs (4 hunks)
  • crates/amaru/src/sync/mod.rs (3 hunks)
  • crates/amaru/src/sync/pull.rs (1 hunks)
  • crates/consensus/src/consensus/chain_selection.rs (5 hunks)
  • crates/consensus/src/consensus/mod.rs (7 hunks)
  • crates/ouroboros/src/protocol/peer.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • crates/ouroboros/src/protocol/peer.rs
  • crates/consensus/src/consensus/chain_selection.rs
  • crates/amaru/src/sync/pull.rs
  • crates/amaru/src/bin/amaru/cmd/daemon.rs
⏰ Context from checks skipped due to timeout of 90000ms (5)
  • GitHub Check: Build on ubuntu-latest with target aarch64-unknown-linux-musl
  • GitHub Check: Build on windows-latest with target x86_64-pc-windows-msvc
  • GitHub Check: Build on macos-latest with target aarch64-apple-darwin
  • GitHub Check: Build on ubuntu-latest with target wasm32-unknown-unknown
  • GitHub Check: Build on ubuntu-latest with target x86_64-unknown-linux-gnu
🔇 Additional comments (3)
crates/amaru/src/sync/mod.rs (1)

44-50: Spot on with the Config struct changes!

The field rename from upstream_peer to upstream_peers is as clear as a bell, mate. Good choice using Vec for multiple peer support.

crates/consensus/src/consensus/mod.rs (2)

41-61: Brilliant use of HashMap for peer sessions!

Using a HashMap for peer_sessions is a proper smart choice - gives us O(1) lookup time, which is faster than a rat up a drainpipe!


112-128: Clean as a whistle!

The switch_to_fork implementation is properly sorted with the new peer parameter. Nice and consistent with the multi-peer changes.

@KtorZ
Copy link
Contributor

KtorZ commented Feb 4, 2025

@abailly I didn't look much further but I am now getting an error when using the command as I used to (single local --peer-address):

Error:   × error connecting bearer
  ╰─▶ Connection refused (os error 61)

I've checked against main, which still works fine.

@abailly
Copy link
Contributor Author

abailly commented Feb 4, 2025

I find it surprising, I tried (after a rebase) to run:

% AMARU_DEV_LOG=info cargo run --release -- daemon \
  --peer-address=clermont:3001 --peer-address=preprod-node.world.dev.cardano.org:30000 \
  --network=preprod  | grep chain_selection

(without otlp stack) and it "works": It does not crash and shows progress and headers from both peers defined. clermont:3001 is a local machine

@abailly abailly force-pushed the abailly/multiple-upstream-peers branch from ff16cfc to 633d89b Compare February 4, 2025 14:43
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
README.md (1)

76-78: Enhanced Multiple Peer Guidance

Crikey, this new line makes it abundantly clear that you can now add multiple peers, which is brilliant for preventing any mix-ups. It might be even more helpful if you tossed in a quick example on how to correctly supply multiple --peer-address flags in a command. This little extra bit could save users from scratching their heads, especially on a foggy morning!

🧰 Tools
🪛 LanguageTool

[style] ~76-~76: Using many exclamation marks might seem excessive (in this case: 4 exclamation marks for a text that’s 2542 characters long)
Context: ....0.1:3000 \ --network=preprod ``` > [!TIP] > Replace --peer-address with you...

(EN_EXCESSIVE_EXCLAMATION)

crates/consensus/src/consensus/mod.rs (1)

63-87: Add duplicate peer handling, mate!

The constructor's looking sharp, but it might get a bit dodgy if we've got duplicate peers in that Vec. The .collect() will quietly keep the last one, which could lead to some head-scratching moments later.

Consider this safer approach:

-        let peer_sessions = peer_sessions
-            .into_iter()
-            .map(|p| (p.peer.clone(), p))
-            .collect::<HashMap<_, _>>();
+        let mut peer_sessions = HashMap::new();
+        for session in peer_sessions {
+            if peer_sessions.insert(session.peer.clone(), session).is_some() {
+                tracing::warn!("Duplicate peer found, keeping the latest session");
+            }
+        }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ff16cfc and 633d89b.

📒 Files selected for processing (7)
  • README.md (1 hunks)
  • crates/amaru/src/bin/amaru/cmd/daemon.rs (4 hunks)
  • crates/amaru/src/sync/mod.rs (3 hunks)
  • crates/amaru/src/sync/pull.rs (1 hunks)
  • crates/consensus/src/consensus/chain_selection.rs (5 hunks)
  • crates/consensus/src/consensus/mod.rs (7 hunks)
  • crates/ouroboros/src/protocol/peer.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
  • crates/ouroboros/src/protocol/peer.rs
  • crates/amaru/src/bin/amaru/cmd/daemon.rs
  • crates/amaru/src/sync/pull.rs
  • crates/consensus/src/consensus/chain_selection.rs
  • crates/amaru/src/sync/mod.rs
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Build on windows-latest with target x86_64-pc-windows-msvc
🔇 Additional comments (3)
crates/consensus/src/consensus/mod.rs (3)

41-61: Brilliant restructuring of the Stage struct!

The change from a single peer_session to a HashMap of peer_sessions is spot on for handling multiple peers. It's like upgrading from a one-on-one chat to a proper group chat - much more sociable! 😄


93-110: LGTM - Keeping it temporary as planned!

The expect() usage has already been flagged in a previous review and acknowledged as temporary. Fair enough for a stopgap solution while waiting for the P2P handling component.


112-208: Clean as a whistle!

The peer parameter flows through the methods like a well-poured pint - smooth and consistent. Good job maintaining the peer context throughout the chain of operations.

@abailly
Copy link
Contributor Author

abailly commented Feb 5, 2025

@KtorZ Were you able to go further?

@KtorZ
Copy link
Contributor

KtorZ commented Feb 5, 2025

Not yet, but about to try!

@abailly abailly force-pushed the abailly/multiple-upstream-peers branch from 633d89b to 76b760c Compare February 5, 2025 10:54
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
crates/consensus/src/consensus/mod.rs (1)

96-101: ⚠️ Potential issue

That expect() is still lurking around like a spider in the bathtub!

This is still using expect() which could cause a nasty crash. I know this was flagged before and deferred, but it's worth mentioning again as it's particularly important with multiple peers.

-            .expect("Unknown peer, bailing out");
+            .ok_or_else(|| WorkerError::from(
+                miette!("Unknown peer: {:?}", peer)
+            ))?;
🧹 Nitpick comments (4)
crates/amaru/src/sync/mod.rs (2)

66-69: Add error handling for empty clients vector

Crikey! We might want to handle the case where no peers are provided. A proper error message would be better than letting it fail downstream.

 pub fn bootstrap(
     config: Config,
     clients: Vec<(String, Arc<Mutex<PeerClient>>)>,
 ) -> miette::Result<Vec<Tether>> {
+    if clients.is_empty() {
+        return Err(miette::miette!("No upstream peers provided"));
+    }

75-86: Consider more descriptive variable names

The variable names could be a bit more descriptive. For example, peer_name could be peer_address since it represents the network address.

-    .map(|(peer_name, client)| PeerSession {
-        peer: Peer::new(peer_name),
+    .map(|(peer_address, client)| PeerSession {
+        peer: Peer::new(peer_address),
crates/consensus/src/consensus/mod.rs (2)

112-128: Add some logging for better observability

Would be ace to add some logging here to track which peer we're switching forks for. Could be a lifesaver when debugging!

 async fn switch_to_fork(
     &mut self,
     peer: &Peer,
     rollback_point: &Point,
     fork: Vec<ConwayHeader>,
 ) -> Result<(), WorkerError> {
+    tracing::info!(
+        peer = ?peer,
+        rollback = ?rollback_point,
+        fork_length = fork.len(),
+        "Switching to fork"
+    );

181-208: Add error context for better debugging

When things go pear-shaped, it'd be helpful to have more context about which peer caused the rollback to fail.

 async fn handle_roll_back(&mut self, peer: &Peer, rollback: &Point) -> Result<(), WorkerError> {
     let result = self
         .chain_selector
         .lock()
         .await
-        .rollback(peer, point_hash(rollback));
+        .rollback(peer, point_hash(rollback))
+        .map_err(|e| miette!("Failed to rollback for peer {:?}: {}", peer, e))?;
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 633d89b and 76b760c.

📒 Files selected for processing (7)
  • README.md (1 hunks)
  • crates/amaru/src/bin/amaru/cmd/daemon.rs (4 hunks)
  • crates/amaru/src/sync/mod.rs (3 hunks)
  • crates/amaru/src/sync/pull.rs (1 hunks)
  • crates/consensus/src/consensus/chain_selection.rs (5 hunks)
  • crates/consensus/src/consensus/mod.rs (7 hunks)
  • crates/ouroboros/src/protocol/peer.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
  • README.md
  • crates/ouroboros/src/protocol/peer.rs
  • crates/amaru/src/sync/pull.rs
  • crates/consensus/src/consensus/chain_selection.rs
  • crates/amaru/src/bin/amaru/cmd/daemon.rs
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: Build on windows-latest with target x86_64-pc-windows-msvc
  • GitHub Check: Build on ubuntu-latest with target x86_64-unknown-linux-gnu
🔇 Additional comments (1)
crates/amaru/src/sync/mod.rs (1)

43-43: Spot on with the Vec for upstream peers!

The change from a single peer to a vector of peers aligns perfectly with the PR's objective.

@abailly abailly force-pushed the abailly/multiple-upstream-peers branch from 76b760c to 2734c7c Compare February 5, 2025 13:56
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
crates/consensus/src/consensus/mod.rs (1)

96-100: ⚠️ Potential issue

Still using expect() here, mate! Remember our chat about this?

As discussed in the previous review, using expect() here is about as safe as a game of cricket in a china shop.

The previous suggestion still stands:

-                .expect("Unknown peer, bailing out");
+                .ok_or_else(|| {
+                    WorkerError::from(miette!("Unknown peer: {}", peer))
+                })?;
🧹 Nitpick comments (2)
crates/amaru/src/sync/mod.rs (1)

66-69: Let's make this signature a bit more readable, mate!

While the changes are solid, that tuple type is about as clear as a London fog.

Consider adding a type alias at the top of the file:

type PeerConnection = (String, Arc<Mutex<PeerClient>>);

Then update the signature:

-pub fn bootstrap(
-    config: Config,
-    clients: Vec<(String, Arc<Mutex<PeerClient>>)>,
-) -> miette::Result<Vec<Tether>> {
+pub fn bootstrap(
+    config: Config,
+    clients: Vec<PeerConnection>,
+) -> miette::Result<Vec<Tether>> {
crates/consensus/src/consensus/mod.rs (1)

93-110: Add some proper tracing to this forward_block method.

Let's make debugging easier than finding a four-leaf clover!

Add more detailed tracing:

+    #[instrument(skip(self, header), fields(peer = ?peer, point = ?header.point()))]
     async fn forward_block(&mut self, peer: &Peer, header: &dyn Header) -> Result<(), WorkerError> {
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 76b760c and 2734c7c.

📒 Files selected for processing (8)
  • README.md (2 hunks)
  • crates/amaru/src/bin/amaru/cmd/daemon.rs (4 hunks)
  • crates/amaru/src/sync/mod.rs (3 hunks)
  • crates/amaru/src/sync/pull.rs (1 hunks)
  • crates/consensus/src/consensus/chain_selection.rs (5 hunks)
  • crates/consensus/src/consensus/header_validation.rs (2 hunks)
  • crates/consensus/src/consensus/mod.rs (7 hunks)
  • crates/ouroboros/src/protocol/peer.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
  • crates/amaru/src/sync/pull.rs
  • crates/consensus/src/consensus/chain_selection.rs
  • crates/amaru/src/bin/amaru/cmd/daemon.rs
  • README.md
  • crates/ouroboros/src/protocol/peer.rs
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Build on windows-latest with target x86_64-pc-windows-msvc
🔇 Additional comments (4)
crates/consensus/src/consensus/header_validation.rs (1)

24-24: LGTM! Clean import addition.

The addition of the Header type import alongside ConwayHeader is spot on.

crates/amaru/src/sync/mod.rs (2)

43-43: Spot on! Good struct field update.

Converting upstream_peer to upstream_peers as a Vec is a proper job for supporting multiple peers.


123-140: Clean as a whistle! Nice chain selector implementation.

The changes to make_chain_selector and tether handling are well thought out.

crates/consensus/src/consensus/mod.rs (1)

173-175: Brilliant chain selection handling!

The peer-specific chain selection is implemented as neat as a pin.

Comment on lines +41 to +46
block_validator
.validate()
.map_err(|e| {
warn!("fail to validate header {}: {:?}", header.hash(), e);
})
.or(Ok(())) // FIXME: Remove this once we have a proper error handling
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Crikey! Let's make this error handling proper sturdy.

While moving from panic to warning is a step in the right direction (like trading Marmite for Vegemite), we could make it even better. The current FIXME suggests this is temporary, but we should consider a more robust approach.

Consider this more comprehensive error handling:

-        block_validator
-            .validate()
-            .map_err(|e| {
-                warn!("fail to validate header {}: {:?}", header.hash(), e);
-            })
-            .or(Ok(())) // FIXME: Remove this once we have a proper error handling
+        block_validator.validate().map_err(|e| {
+            warn!("Header validation failed for {}: {:?}", header.hash(), e);
+            WorkerError::from(miette!("Header validation error: {}", e))
+        })

This way, we:

  1. Maintain proper error propagation
  2. Keep the warning for observability
  3. Convert the error into a proper WorkerError
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
block_validator
.validate()
.map_err(|e| {
warn!("fail to validate header {}: {:?}", header.hash(), e);
})
.or(Ok(())) // FIXME: Remove this once we have a proper error handling
block_validator.validate().map_err(|e| {
warn!("Header validation failed for {}: {:?}", header.hash(), e);
WorkerError::from(miette!("Header validation error: {}", e))
})

Comment on lines +75 to +86
let peer_sessions: Vec<PeerSession> = clients
.iter()
.map(|(peer_name, client)| PeerSession {
peer: Peer::new(peer_name),
peer_client: client.clone(),
})
.collect();

let mut pulls = peer_sessions
.iter()
.map(|session| pull::Stage::new(session.clone(), vec![tip.clone()]))
.collect::<Vec<_>>();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add some error handling to this peer session setup.

At the moment, if anything goes wrong during peer session creation, it'll be about as graceful as a kangaroo in roller skates.

Consider adding error handling:

-    let peer_sessions: Vec<PeerSession> = clients
-        .iter()
-        .map(|(peer_name, client)| PeerSession {
-            peer: Peer::new(peer_name),
-            peer_client: client.clone(),
-        })
-        .collect();
+    let peer_sessions: miette::Result<Vec<PeerSession>> = clients
+        .iter()
+        .map(|(peer_name, client)| {
+            if peer_name.is_empty() {
+                return Err(miette!("Empty peer name not allowed"));
+            }
+            Ok(PeerSession {
+                peer: Peer::new(peer_name),
+                peer_client: client.clone(),
+            })
+        })
+        .collect();
+    let peer_sessions = peer_sessions?;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let peer_sessions: Vec<PeerSession> = clients
.iter()
.map(|(peer_name, client)| PeerSession {
peer: Peer::new(peer_name),
peer_client: client.clone(),
})
.collect();
let mut pulls = peer_sessions
.iter()
.map(|session| pull::Stage::new(session.clone(), vec![tip.clone()]))
.collect::<Vec<_>>();
let peer_sessions: miette::Result<Vec<PeerSession>> = clients
.iter()
.map(|(peer_name, client)| {
if peer_name.is_empty() {
return Err(miette!("Empty peer name not allowed"));
}
Ok(PeerSession {
peer: Peer::new(peer_name),
peer_client: client.clone(),
})
})
.collect();
let peer_sessions = peer_sessions?;
let mut pulls = peer_sessions
.iter()
.map(|session| pull::Stage::new(session.clone(), vec![tip.clone()]))
.collect::<Vec<_>>();

block_validator
.validate()
.map_err(|e| {
warn!("fail to validate header {}: {:?}", header.hash(), e);
Copy link
Contributor

@KtorZ KtorZ Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
warn!("fail to validate header {}: {:?}", header.hash(), e);
warn!(header.hash = ?header.hash(), validation.error = ?e, "failed_to_validate");

A "best-practice" I've noted from using the tracing library (but which apply to telemetry in general) is to leverage fields as much as possible, and avoid embedding fields/payloads in the log message. So said differently, embrace structured logs/traces ! This is what allows us to have nice JSON logs out-of-the-box 😛

Also by convention for now, I've always used snake_cased names for all traces & spans; really more out of paranoia than out of any legit reason.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agreed, that's why I voluntarily wrote this temporary kludge as a textual log entry so that we spot it more easily and change it to something consistent with the rest.

In general, I think we should only ever need instrument!, any other kind of logs is probably unneeded, or is begging for a function to instrument :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should only ever need instrument!

That's a good rule of thumb regarding how we slice the code, indeed.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
crates/consensus/src/consensus/mod.rs (1)

67-76: Consider adding validation for duplicate peers.

Crikey! The peer session collection looks good, but we might want to handle the case where the same peer appears multiple times in the input vector.

 let peer_sessions = peer_sessions
     .into_iter()
+    .inspect(|p| {
+        if peer_sessions.iter().filter(|x| x.peer == p.peer).count() > 1 {
+            tracing::warn!("Duplicate peer detected: {:?}", p.peer);
+        }
+    })
     .map(|p| (p.peer.clone(), p))
     .collect::<HashMap<_, _>>();
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2734c7c and 10c044a.

📒 Files selected for processing (3)
  • crates/consensus/src/consensus/chain_selection.rs (10 hunks)
  • crates/consensus/src/consensus/header_validation.rs (2 hunks)
  • crates/consensus/src/consensus/mod.rs (9 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/consensus/src/consensus/header_validation.rs
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Build on windows-latest with target x86_64-pc-windows-msvc
🔇 Additional comments (5)
crates/consensus/src/consensus/mod.rs (3)

46-46: LGTM! Good choice using HashMap for peer sessions.

The HashMap provides O(1) lookup for peer sessions, which is spot on for managing multiple peers.


98-102: Previous review comment still applies.

G'day! I see we're still using expect here. As per the previous review discussion, this will be handled later.


114-127: LGTM! Clean peer integration.

The peer parameter addition is well integrated with the existing fork switching logic.

crates/consensus/src/consensus/chain_selection.rs (2)

166-167: Brilliant logging improvements!

The DEBUG level is spot on for these events, and the added context will be a lifesaver during troubleshooting. Good on ya!

Also applies to: 176-181


343-343: Tests properly updated for multi-peer support.

The test cases have been properly updated to handle the new peer parameter in SwitchToFork. Bob's your uncle!

Also applies to: 447-447

Comment on lines 96 to +102
/// The current best chain has switched to given fork starting at
/// given hash.
SwitchToFork(Point, Vec<H>),
/// given hash, from given Peer.
///
/// FIXME: The peer should not be needed here, as the fork should be
/// comprised of known blocks. It is only needed to download the blocks
/// we don't currently store.
SwitchToFork(Peer, Point, Vec<H>),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Consider decoupling block storage from peer information.

The FIXME raises a fair dinkum point! We might want to separate the concerns of block storage and peer management. Perhaps introduce a block cache or local storage service?

Let's check if there's any existing block storage implementation:


🏁 Script executed:

#!/bin/bash
# Search for block storage related code
rg -i "block.*store|store.*block" --type rust

Length of output: 172

Decoupling Peer Information from Block Storage

After a friendly little spelunking through our codebase, the search turned up a solitary reference to block storage—specifically in the ledger rewards (see the count_blocks function). In contrast, our consensus module’s SwitchToFork(Peer, Point, Vec<H>) variant still couples peer information right into the chain switching process. This confirms the architectural concern raised by the FIXME comment: the Peer parameter is being used as a crutch for downloading blocks, rather than relying on a dedicated block storage or caching mechanism.

Recommendations:

  • Refactor the SwitchToFork variant: Remove the Peer dependency if possible, and decouple block storage concerns from peer management.
  • Consider a dedicated block storage mechanism: Explore implementing or reusing a block cache or local storage service, in line with the isolated block storage practices found elsewhere (e.g. within ledger rewards).

Cheers, and let’s give this a proper tidy-up!

@KtorZ KtorZ merged commit 400e280 into main Feb 5, 2025
7 checks passed
@KtorZ KtorZ deleted the abailly/multiple-upstream-peers branch February 5, 2025 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants