Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make L1 client async again #2240

Merged
merged 6 commits into from
Nov 4, 2024
Merged

Make L1 client async again #2240

merged 6 commits into from
Nov 4, 2024

Conversation

jbearer
Copy link
Member

@jbearer jbearer commented Oct 31, 2024

Make L1 client update asynchronously in a background task. This allows us to subscribe to updates from the L1, rather than polling every time we have a new HotShot block. This vastly reduces the number of RPC calls we make. This is in response to an increase in the number of L1 RPC calls being made, due to the new logic that validates the L1 head and finalized in each proposal.

This PR:

  • Rewrites the L1 client to work based on an in-memory state that is updated by an async task
  • Implements the various wait_for_* functions using an event stream populated by the async task
  • Supports either HTTP or WebSockets as a backend for the L1 client
  • Adds an LRU cache for L1 blocks
  • Adds more configurability for L1 client

Key places to review:

  • types/src/v0/impls/l1.rs

This allows us to subscribe to updates from the L1, rather than
polling every time we have a new HotShot block. This vastly reduces
the number of RPC calls we make.

Also adds an LRU cache for L1 blocks.
@jbearer jbearer force-pushed the jb/async-l1-client branch from c58b6b2 to b712b84 Compare October 31, 2024 15:48
let state = self.state.clone();
let sender = self.sender.clone();

let span = tracing::warn_span!("L1 client update");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this do? Does it apply to anything on the WARN level and more serious or only on the WARN level?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It applies to anything but the span will only show up on WARN or higher. But now that I think about it since this span is just a tag with no extra data, it's quite cheap to log this all the time, so this could easily be info_span!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

er wait I had it backwards. This appears any time the log level is set to WARN or lower, just like a normal tracing::warn!. So we actually want to keep it this way, we want to see this info even if the node is running with RUST_LOG=warn

continue;
};
if head >= number {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we log that we are waiting for the block it's also nice to log that we found it.

let L1Event::NewFinalized { finalized } = event else {
continue;
};
if finalized.number < number {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: for the latest block we do the comparison the other way around above: if head >= number { I think we can do it the same way both times. Then we also don't need this continue, I think.

tracing::warn!("sleeping for {dur:?}, until {timestamp}");
sleep(Duration::from_secs(dur)).await;
}
// Wait until the finalized block has timestamp >= `timestamp`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if we actually need this function. It's only used on startup if the genesis block is not finalized.

.wait_for_finalized_block_with_timestamp(timestamp.unix_timestamp().into())

I'm wondering if it would be okay to just panic in that case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we can't just panic there. The intended use case is you set the genesis block to something in the future, then you can start up all the nodes ahead of time, and they will just wait until the genesis block is ready and then start running

assert!(
state.snapshot.finalized.is_some()
&& number <= state.snapshot.finalized.unwrap().number,
"requesting a finalized block {number} that isn't finalized; snapshot: {:?}",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder why we don't make this a fallible function instead of one that might panic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is this can't panic based solely on the internal behavior of this module, i.e. we only ever call this private helper function with a finalized block number. There is no input from callers in any other modules that can cause this function to panic, so we shouldn't bubble the error up to the caller.

And because the eventual interface is infallible (e.g. validate_apply_header calls wait_for_finalized_block and blocks until we either get it or time out) any errors we return would have to be handled with retry loops which IMO makes the calling code more complicated for little reason, when it's an error that should never happen to begin with

pub(crate) struct L1State {
pub(crate) snapshot: L1Snapshot,
pub(crate) finalized: LruCache<u64, L1BlockInfo>,
pub(crate) finalized_by_timestamp: BTreeMap<U256, u64>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to my last comment, IIUC since is only used on startup and in-memory only I think it might be simpler if we didn't have it at all or at least didn't keep a cache for it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, removing this cache makes things considerably simpler at very little cost

@sveitser sveitser self-requested a review November 4, 2024 15:34
Copy link
Collaborator

@sveitser sveitser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, just non-urgent suggestions.

jbearer and others added 3 commits November 4, 2024 09:06
This cache is used only at startup, and it barely even saves us
any RPC calls, since the loop where we work backwards looking for
the earliest block with a given timestamp already uses the other
cache. Removing this simplifies things considerably.
@jbearer jbearer merged commit 5bc096e into main Nov 4, 2024
18 checks passed
@jbearer jbearer deleted the jb/async-l1-client branch November 4, 2024 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants