Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download blobs from the node that gave us the certificate. #3034

Merged

Conversation

afck
Copy link
Contributor

@afck afck commented Dec 16, 2024

Motivation

In receive_certificates_for_blobs we simultaneously ask all validators for a certificate that (re-)certifies a blob. We then take the first response we get and call receive_certificate. That will fail because we don't have the blob yet, which we in turn request from all validators sequentially! Due to #3033, if the first validator we ask doesn't have it, we keep retrying a few times before we move on to the next one, which can take a long time.

Proposal

Download the blob in question from the validator that sent us the certificate and is therefore expected to have it.

Test Plan

For me this reduced the process-inbox in #3029 from 53 to 7 seconds.

Release Plan

  • These changes should be backported to the latest main branch.
  • These changes should be backported to the latest devnet branch, then
    • be released in a new SDK.

Links

@@ -1807,15 +1816,26 @@ where
for blob_id in blob_ids {
let mut certificate_stream = validators
.iter()
.map(|remote_node| remote_node.download_certificate_for_blob(blob_id))
.map(|remote_node| async move {
let cert = remote_node.download_certificate_for_blob(blob_id).await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this still keep the old semantics of running all futures in parallel thanks to async move?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it maps each remote_node to a future (the async move block) and then creates the FuturesUnordered from them.

@@ -1207,6 +1208,7 @@ where
&self,
certificate: Certificate,
mode: ReceiveCertificateMode,
nodes: Option<Vec<RemoteNode<P::Node>>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Download the blob in question from the validator that sent us the certificate and is therefore expected to have it.

This code looks like it does the opposite, i.e. downloading the certificate from the validator that has the blob. What am I missing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The certificate itself—the one we downloaded because of the blob in question—is passed into this function. But the function also automatically downloads all the ancestors and blobs, if needed.

It currently downloads those from all validators, sequentially. With this change, if a node is provided explicitly, it downloads them only from that node.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might lead to some performance degradation if this one particular node becomes a bottleneck for this function call. Although IDK how to trigger that (to check) or fix the issue differently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly, that's a more general and far-reaching discussion we should have at some point: The right tradeoff between latency (ask all validators everything at the same time, take the first response) and bandwidth (ask them in sequence).

But at least it triggers exactly that bottleneck way less frequently that before: Before, we were simply asking the wrong validator often, and then, due to #3033, waiting a long time for it.

@afck afck merged commit d80e059 into linera-io:testnet_archimedes Dec 16, 2024
21 checks passed
afck added a commit to afck/linera-protocol that referenced this pull request Dec 16, 2024
@afck afck deleted the testnet_archimedes-fast-blob-download branch December 16, 2024 17:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants