Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DA: Consensus sampling #708

Merged
merged 11 commits into from
Sep 3, 2024
Merged

DA: Consensus sampling #708

merged 11 commits into from
Sep 3, 2024

Conversation

danielSanchezQ
Copy link
Collaborator

Add sampling to consensus.
Added a verification of block so it is not included if as a non-proposer you do not see the included blobs in your sample view.
As a proposer include only the blobs that you see available in your sample view.

Comment on lines +734 to 755
fn validate_block(
block: &Block<ClPool::Item, DaPool::Item>,
sampled_blobs_ids: &BTreeSet<DaPool::Key>,
) -> bool {
let validated_blobs = block
.blobs()
.all(|blob| sampled_blobs_ids.contains(&blob.blob_id()));
validated_blobs
}
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very temporary and will have to include many other things later one. But I think for now will do.

Copy link
Contributor

@holisticode holisticode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically looks good to me. Left a couple of questions, mainly.

.with_blobs_certificates(da_certs)
.with_blobs_info(
da_blobs_info.filter(move |info| blobs_ids.contains(&info.blob_id())),
)
.build()
else {
panic!("Proposal block should always succeed to be built")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this temporary too?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, if it panics is because we forget to add something. Exactly why we want it to panic. A panic is an error in coding that is not catched in compilation time.


match futures::join!(cl_txs, da_certs) {
(Ok(cl_txs), Ok(da_certs)) => {
let blobs_ids = get_sampled_blobs(sampling_relay);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is the proposer who is querying for sampled blobs, not the mempool, right? Does it actually make any difference here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mempool will trigger the sampling, but it has no usage for the already sampled blobs.
Btw, mempools will change once we actually have TXs, as blobs are part of the TXs themselves. Probably DA mempool gonna be remove entirely.

.send(DaSamplingServiceMsg::MarkInBlock { blobs_id })
.await
{
error!("Error marking in block for blobs ids: {blobs_id:?}");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am actually seeing a couple of places where errors are just logged.
What would actually the consequence be of an error here, for example? Is logging enough?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should never happen as it is internally to the node (msg from service to service). We could actually panic here. Or try to recover. There is no clear policy on this cases. Ideas and suggestions are welcome (open an issue for them).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have learnt in other contexts that when a node encounters an issue it can't actually recover from, it should shutdown, but not panic.

The question here is, what does such an error as this one mean here. Is it fatal, an unrecoverable situation? Actually not really, but something is very fishy still.

Still, a node shutdown should be preferred to a panic IMO.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment there is no really a difference between shutdown and panic.

  • A panic should be used when a path is a bug, meaning that the programmed logic is wrong, it should be fixed.
  • An error that does not matter (the node may continue) should be logged and execution should continue.
  • When something neither the node nor we expect then as you said it should be shutdown/restart.

In this specific case it was logged so we can see something went wrong then we should try to recover the relay later on (as it was broken), and if its not possible then the node could shut down. But still do not have any recoverable measures in place (anywhere). Not something that we are prioritising now.
But if you are interested it would be really beneficial to prepare a design for such policies!

@@ -63,7 +65,7 @@ impl FillWithOriginalReplication {
}

impl MembershipHandler for FillWithOriginalReplication {
type NetworkId = u16;
type NetworkId = u32;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's SubnetworkId type in nomos-da-network-core, maybe we could use it in all network related places?

Copy link
Collaborator Author

@danielSanchezQ danielSanchezQ Aug 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I changed it as it was a miss aligned type. Also using the SubnetworkId makes it to introduce a full crate just to use the type here, and as it is a type alias atm I do not think is completely worth it.

todo!()
}

async fn mark_in_block(&mut self, _blobs_id: &[Self::BlobId]) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the context sampling, it seems that this could better be a mark_complete or similar.
mark_in_block is actually happening in the consensus, this will just receive a notification about it and make actions according to it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, this makes sense.

@danielSanchezQ
Copy link
Collaborator Author

@bacv, is it possible that this now fails because of the tests we just added? As I'm changing consensus 🤔
How are we checking the dissemination is ok? Consensus wise should be fine as we do not have any blob in the blocks.

@bacv
Copy link
Member

bacv commented Aug 30, 2024

@bacv, is it possible that this now fails because of the tests we just added? As I'm changing consensus 🤔 How are we checking the dissemination is ok? Consensus wise should be fine as we do not have any blob in the blocks.

Da integration tests are using consensus, this pr changed the generic params that needs to be passed to it, and the tests still uses old ones. I'll make a pr that fixes them (to run locally use cargo test -p nomos-da-tests -F libp2p)

@danielSanchezQ
Copy link
Collaborator Author

@bacv, is it possible that this now fails because of the tests we just added? As I'm changing consensus 🤔 How are we checking the dissemination is ok? Consensus wise should be fine as we do not have any blob in the blocks.

Da integration tests are using consensus, this pr changed the generic params that needs to be passed to it, and the tests still uses old ones. I'll make a pr that fixes them (to run locally use cargo test -p nomos-da-tests -F libp2p)

Ok, I see them now. I couldn't reproduce them before. I'll fix it in this PR don't worry!

@bacv
Copy link
Member

bacv commented Aug 30, 2024

@bacv, is it possible that this now fails because of the tests we just added? As I'm changing consensus 🤔 How are we checking the dissemination is ok? Consensus wise should be fine as we do not have any blob in the blocks.

I also see another issue in the nodes test, the DA-Sampling service is required by consensus, but it's not started in the node.

@danielSanchezQ
Copy link
Collaborator Author

@bacv, is it possible that this now fails because of the tests we just added? As I'm changing consensus 🤔 How are we checking the dissemination is ok? Consensus wise should be fine as we do not have any blob in the blocks.

I also see another issue in the nodes test, the DA-Sampling service is required by consensus, but it's not started in the node.

Ah damn, its true. You added recently in a different PR right? i'll wait for that then.

@bacv
Copy link
Member

bacv commented Aug 30, 2024

@bacv, is it possible that this now fails because of the tests we just added? As I'm changing consensus 🤔 How are we checking the dissemination is ok? Consensus wise should be fine as we do not have any blob in the blocks.

I also see another issue in the nodes test, the DA-Sampling service is required by consensus, but it's not started in the node.

Ah damn, its true. You added recently in a different PR right? i'll wait for that then.

In that one the verifier and indexer were added, so sampling should be added in a similar fashion.

@danielSanchezQ
Copy link
Collaborator Author

@bacv, is it possible that this now fails because of the tests we just added? As I'm changing consensus 🤔 How are we checking the dissemination is ok? Consensus wise should be fine as we do not have any blob in the blocks.

I also see another issue in the nodes test, the DA-Sampling service is required by consensus, but it's not started in the node.

Ah damn, its true. You added recently in a different PR right? i'll wait for that then.

In that one the verifier and indexer were added, so sampling should be added in a similar fashion.

Cool, so I'll just add it here as well then. Thanks!

@danielSanchezQ
Copy link
Collaborator Author

Thanks for the effort @bacv ! 🎸

@danielSanchezQ danielSanchezQ merged commit a13f861 into master Sep 3, 2024
12 checks passed
@danielSanchezQ danielSanchezQ deleted the consensus-sampling branch September 3, 2024 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants