Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set pending block only after making sure it is valid. #2925

Merged
merged 3 commits into from
Nov 20, 2024

Conversation

afck
Copy link
Contributor

@afck afck commented Nov 20, 2024

Motivation

#2904 introduced block proposal size limits that are only enforced when the proposal was handled in the worker, not when staging block execution. This could result in the pending block getting set to an invalid proposal.

Proposal

Handle the proposal in the local node before setting the pending block.

Test Plan

The explicit clear_pending_block call was removed from the test.

Release Plan

  • Nothing to do / These changes follow the usual release cycle.

Links

Comment on lines +1980 to +1984
if manager
.requested_proposed
.as_ref()
.map(|proposal| &proposal.content)
!= Some(&proposal.content)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fail to see how this addition is related to the main objective of the PR - can you explain?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now already call handle_block_proposal earlier, before we set the pending block to that proposal. It would be wasteful to call it here again. So we check if the proposal the local node has already seen is the one we are about to send to the validators; if it is, there is no reason to call handle_block_proposal again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, but we extracted the requested_propsed earlier already - in line 1950 - can we use that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now already call handle_block_proposal earlier, before we set the pending block to that proposal.

Earlier than what? I can't find that it changed from what it used to be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how we could merge the two places where we look into requested_proposed; the logic is very different, and in the second case, the condition is also true in the None case.

Earlier than what?

I mean we call it already before we set the pending block, in execute_block.

Copy link
Contributor

@deuszx deuszx Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean we call it already before we set the pending block, in execute_block.

Now I see it - in execute_block we call

  • set_pending_block which calls
    • handle_block_proposal and then
    • self.state_mut().set_pending_block(block).
  • After that we call process_pending_block_without_prepare which in turn calls
    • thispropose_block.

So we don't need to re-handle.

It's all so clear now 🙃

@@ -2108,12 +2123,20 @@ where
#[instrument(level = "trace", skip(incoming_bundles, operations))]
async fn set_pending_block(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can rename this method to new_pending_block? Or maybe you have even better idea.

The reason I'm suggesting is:

  • It's not setting per-se, it's creating and then setting. But it can still fail so it may not even set anything.
  • set_pending_block calls another set_pending_block method and I don't really like having many methods with exact same name

.local_node
.handle_block_proposal(*proposal.clone())
.await?;
if manager
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In line 1970 we extract manager.requested_locked but we did that in line 1938 as well. Are these duplications necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not, and also some of the checks we do (already before this PR) are redundant; but I was considering to do those cleanups in a separate PR. I'll look into it, and if the change isn't huge I can do them right away.

Copy link
Contributor

@deuszx deuszx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The flow for block proposal is very difficult to reason about (so many branches and edge cases) that I can only approve to the best of my knoweldge. We should put some work into disentangling this.

@afck
Copy link
Contributor Author

afck commented Nov 20, 2024

Thanks!
As discussed, I will try to simplify the code in a separate PR.

@afck afck merged commit 3a66ab0 into linera-io:main Nov 20, 2024
21 checks passed
@afck afck deleted the set-pending-block branch November 20, 2024 18:09
self.stage_block_execution_and_discard_failing_messages(block)
.await?
.0
self.stage_block_execution(block).await?.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing I'm failing to understand: if we always stage the block execution before adding the block as pending, then why do we ever need to stage it here again? Is this where that ExecutedBlock cache that we've talked about will come in?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly: It's only because we kind of need the executed block, and didn't cache that earlier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Clear pending block if failed locally
3 participants