-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make L1 client async again #2240
Conversation
This allows us to subscribe to updates from the L1, rather than polling every time we have a new HotShot block. This vastly reduces the number of RPC calls we make. Also adds an LRU cache for L1 blocks.
c58b6b2
to
b712b84
Compare
let state = self.state.clone(); | ||
let sender = self.sender.clone(); | ||
|
||
let span = tracing::warn_span!("L1 client update"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this do? Does it apply to anything on the WARN level and more serious or only on the WARN level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It applies to anything but the span will only show up on WARN or higher. But now that I think about it since this span is just a tag with no extra data, it's quite cheap to log this all the time, so this could easily be info_span!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
er wait I had it backwards. This appears any time the log level is set to WARN or lower, just like a normal tracing::warn!
. So we actually want to keep it this way, we want to see this info even if the node is running with RUST_LOG=warn
continue; | ||
}; | ||
if head >= number { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if we log that we are waiting for the block it's also nice to log that we found it.
types/src/v0/impls/l1.rs
Outdated
let L1Event::NewFinalized { finalized } = event else { | ||
continue; | ||
}; | ||
if finalized.number < number { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: for the latest block we do the comparison the other way around above: if head >= number {
I think we can do it the same way both times. Then we also don't need this continue
, I think.
tracing::warn!("sleeping for {dur:?}, until {timestamp}"); | ||
sleep(Duration::from_secs(dur)).await; | ||
} | ||
// Wait until the finalized block has timestamp >= `timestamp`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if we actually need this function. It's only used on startup if the genesis block is not finalized.
espresso-sequencer/sequencer/src/lib.rs
Line 483 in b712b84
.wait_for_finalized_block_with_timestamp(timestamp.unix_timestamp().into()) |
I'm wondering if it would be okay to just panic in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we can't just panic there. The intended use case is you set the genesis block to something in the future, then you can start up all the nodes ahead of time, and they will just wait until the genesis block is ready and then start running
assert!( | ||
state.snapshot.finalized.is_some() | ||
&& number <= state.snapshot.finalized.unwrap().number, | ||
"requesting a finalized block {number} that isn't finalized; snapshot: {:?}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonder why we don't make this a fallible function instead of one that might panic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is this can't panic based solely on the internal behavior of this module, i.e. we only ever call this private helper function with a finalized block number. There is no input from callers in any other modules that can cause this function to panic, so we shouldn't bubble the error up to the caller.
And because the eventual interface is infallible (e.g. validate_apply_header
calls wait_for_finalized_block
and blocks until we either get it or time out) any errors we return would have to be handled with retry loops which IMO makes the calling code more complicated for little reason, when it's an error that should never happen to begin with
types/src/v0/v0_1/l1.rs
Outdated
pub(crate) struct L1State { | ||
pub(crate) snapshot: L1Snapshot, | ||
pub(crate) finalized: LruCache<u64, L1BlockInfo>, | ||
pub(crate) finalized_by_timestamp: BTreeMap<U256, u64>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to my last comment, IIUC since is only used on startup and in-memory only I think it might be simpler if we didn't have it at all or at least didn't keep a cache for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, removing this cache makes things considerably simpler at very little cost
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, just non-urgent suggestions.
Co-authored-by: Mathis <[email protected]>
Co-authored-by: Mathis <[email protected]>
This cache is used only at startup, and it barely even saves us any RPC calls, since the loop where we work backwards looking for the earliest block with a given timestamp already uses the other cache. Removing this simplifies things considerably.
Make L1 client update asynchronously in a background task. This allows us to subscribe to updates from the L1, rather than polling every time we have a new HotShot block. This vastly reduces the number of RPC calls we make. This is in response to an increase in the number of L1 RPC calls being made, due to the new logic that validates the L1 head and finalized in each proposal.
This PR:
wait_for_*
functions using an event stream populated by the async taskKey places to review:
types/src/v0/impls/l1.rs