-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Download blobs from the node that gave us the certificate. #3034
Download blobs from the node that gave us the certificate. #3034
Conversation
@@ -1807,15 +1816,26 @@ where | |||
for blob_id in blob_ids { | |||
let mut certificate_stream = validators | |||
.iter() | |||
.map(|remote_node| remote_node.download_certificate_for_blob(blob_id)) | |||
.map(|remote_node| async move { | |||
let cert = remote_node.download_certificate_for_blob(blob_id).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this still keep the old semantics of running all futures in parallel thanks to async move
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it maps each remote_node
to a future (the async move
block) and then creates the FuturesUnordered
from them.
@@ -1207,6 +1208,7 @@ where | |||
&self, | |||
certificate: Certificate, | |||
mode: ReceiveCertificateMode, | |||
nodes: Option<Vec<RemoteNode<P::Node>>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Download the blob in question from the validator that sent us the certificate and is therefore expected to have it.
This code looks like it does the opposite, i.e. downloading the certificate from the validator that has the blob. What am I missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The certificate itself—the one we downloaded because of the blob in question—is passed into this function. But the function also automatically downloads all the ancestors and blobs, if needed.
It currently downloads those from all validators, sequentially. With this change, if a node is provided explicitly, it downloads them only from that node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might lead to some performance degradation if this one particular node becomes a bottleneck for this function call. Although IDK how to trigger that (to check) or fix the issue differently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly, that's a more general and far-reaching discussion we should have at some point: The right tradeoff between latency (ask all validators everything at the same time, take the first response) and bandwidth (ask them in sequence).
But at least it triggers exactly that bottleneck way less frequently that before: Before, we were simply asking the wrong validator often, and then, due to #3033, waiting a long time for it.
Motivation
In
receive_certificates_for_blobs
we simultaneously ask all validators for a certificate that (re-)certifies a blob. We then take the first response we get and callreceive_certificate
. That will fail because we don't have the blob yet, which we in turn request from all validators sequentially! Due to #3033, if the first validator we ask doesn't have it, we keep retrying a few times before we move on to the next one, which can take a long time.Proposal
Download the blob in question from the validator that sent us the certificate and is therefore expected to have it.
Test Plan
For me this reduced the
process-inbox
in #3029 from 53 to 7 seconds.Release Plan
main
branch.devnet
branch, thenLinks
Blob not found on storage read
#3029.