Sending to many peers can be tolerant of failures #160

Maelkum · 2024-07-22T13:38:23Z

This PR makes a slight change to how sendToMany() operates.

Before, we were strict in how we treat errors - failing to send message even to a single peer meant we treated the whole operation as a failure. Also, if send to n-th peer failed - we did not even try peers that are after it in the list.

This caused problems in cases where we might have flaky peers that do reply to a roll call but might drop off afterwards, pulling down the whole operation with it. Additionally, we were sending messages sequentially.

Now, we parallelize sends, and also tolerate partial send failures, similar to how we treat waiting for execution responses - we might not get everyone. This goes for non-consensus executions at the moment. For consensus we still require all peers get the message.

Also - if ALL sends fail - we do treat that as an error.

Maelkum added 2 commits July 22, 2024 11:49

Tolerate send errors on roll call (unless consesus is required)

1c0fbba

Add test to sendoToMany

e64f98e

Maelkum requested a review from dmikey July 22, 2024 13:38

Maelkum self-assigned this Jul 22, 2024

dmikey approved these changes Aug 30, 2024

View reviewed changes

dmikey merged commit b1a1ab4 into main Aug 30, 2024
5 checks passed

dmikey deleted the send-to-many-best-effort branch August 30, 2024 16:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sending to many peers can be tolerant of failures #160

Sending to many peers can be tolerant of failures #160

Maelkum commented Jul 22, 2024

Sending to many peers can be tolerant of failures #160

Sending to many peers can be tolerant of failures #160

Conversation

Maelkum commented Jul 22, 2024