-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DA: Consensus sampling #708
Conversation
fn validate_block( | ||
block: &Block<ClPool::Item, DaPool::Item>, | ||
sampled_blobs_ids: &BTreeSet<DaPool::Key>, | ||
) -> bool { | ||
let validated_blobs = block | ||
.blobs() | ||
.all(|blob| sampled_blobs_ids.contains(&blob.blob_id())); | ||
validated_blobs | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very temporary and will have to include many other things later one. But I think for now will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically looks good to me. Left a couple of questions, mainly.
.with_blobs_certificates(da_certs) | ||
.with_blobs_info( | ||
da_blobs_info.filter(move |info| blobs_ids.contains(&info.blob_id())), | ||
) | ||
.build() | ||
else { | ||
panic!("Proposal block should always succeed to be built") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this temporary too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, if it panics is because we forget to add something. Exactly why we want it to panic. A panic is an error in coding that is not catched in compilation time.
|
||
match futures::join!(cl_txs, da_certs) { | ||
(Ok(cl_txs), Ok(da_certs)) => { | ||
let blobs_ids = get_sampled_blobs(sampling_relay); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is the proposer who is querying for sampled blobs, not the mempool, right? Does it actually make any difference here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mempool will trigger the sampling, but it has no usage for the already sampled blobs.
Btw, mempools will change once we actually have TXs, as blobs are part of the TXs themselves. Probably DA mempool gonna be remove entirely.
.send(DaSamplingServiceMsg::MarkInBlock { blobs_id }) | ||
.await | ||
{ | ||
error!("Error marking in block for blobs ids: {blobs_id:?}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am actually seeing a couple of places where errors are just logged.
What would actually the consequence be of an error here, for example? Is logging enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should never happen as it is internally to the node (msg from service to service). We could actually panic here. Or try to recover. There is no clear policy on this cases. Ideas and suggestions are welcome (open an issue for them).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have learnt in other contexts that when a node encounters an issue it can't actually recover from, it should shutdown, but not panic.
The question here is, what does such an error as this one mean here. Is it fatal, an unrecoverable situation? Actually not really, but something is very fishy still.
Still, a node shutdown should be preferred to a panic IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment there is no really a difference between shutdown and panic.
- A panic should be used when a path is a bug, meaning that the programmed logic is wrong, it should be fixed.
- An error that does not matter (the node may continue) should be logged and execution should continue.
- When something neither the node nor we expect then as you said it should be shutdown/restart.
In this specific case it was logged so we can see something went wrong then we should try to recover the relay later on (as it was broken), and if its not possible then the node could shut down. But still do not have any recoverable measures in place (anywhere). Not something that we are prioritising now.
But if you are interested it would be really beneficial to prepare a design for such policies!
@@ -63,7 +65,7 @@ impl FillWithOriginalReplication { | |||
} | |||
|
|||
impl MembershipHandler for FillWithOriginalReplication { | |||
type NetworkId = u16; | |||
type NetworkId = u32; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's SubnetworkId
type in nomos-da-network-core
, maybe we could use it in all network related places?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I changed it as it was a miss aligned type. Also using the SubnetworkId
makes it to introduce a full crate just to use the type here, and as it is a type alias atm I do not think is completely worth it.
todo!() | ||
} | ||
|
||
async fn mark_in_block(&mut self, _blobs_id: &[Self::BlobId]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the context sampling, it seems that this could better be a mark_complete
or similar.
mark_in_block
is actually happening in the consensus, this will just receive a notification about it and make actions according to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, this makes sense.
07a8d90
to
441ddb2
Compare
@bacv, is it possible that this now fails because of the tests we just added? As I'm changing consensus 🤔 |
Da integration tests are using consensus, this pr changed the generic params that needs to be passed to it, and the tests still uses old ones. I'll make a pr that fixes them (to run locally use |
Ok, I see them now. I couldn't reproduce them before. I'll fix it in this PR don't worry! |
I also see another issue in the nodes test, the |
Ah damn, its true. You added recently in a different PR right? i'll wait for that then. |
In that one the verifier and indexer were added, so sampling should be added in a similar fashion. |
Cool, so I'll just add it here as well then. Thanks! |
24c30da
to
204b6ce
Compare
204b6ce
to
4887347
Compare
Thanks for the effort @bacv ! 🎸 |
Add sampling to consensus.
Added a verification of block so it is not included if as a non-proposer you do not see the included blobs in your sample view.
As a proposer include only the blobs that you see available in your sample view.