[EPIC] Live replay / catchup #11

jackzampolin · 2024-08-16T20:37:47Z

The Mirror layer, sitting in front of the state machine, is already capable of tracking the network while the state machine is unresponsive or even absent.

Now we need to implement the path of the mirror replaying "old" (older than the most recently committed block) requests to the state machine.

This takes care of two primary usecases:

Node is offline for a period n blocks and needs to sync those block to rejoin consensus (live replay)
Node is coming online for a live network and needs to sync all blocks from genesis to chain tip (live catchup)

Live catchup is akin to block sync in Comet terms. Starting from genesis or some previously saved state, the Mirror is many blocks behind the rest of the network, and simply observing the current consensus messages is insufficient to reach the current height.

We do not currently have an RPC or HTTP endpoint to serve old block headers nor those blocks' data (i.e. the transactions). There is a design decision to make about serving individual historical blocks, versus perhaps paging and compressing 100 or 1000 blocks at a time. This could mean, for example, at Height 750, heights 700-749 are retrievable individually, but you can only get the collections for 1-99, 100-199, etc.

Beyond the mechanism of transmitting historic block data, there is a secondary issue of host discovery. The two obvious options are 1) you have to know the address of a host who supports live catchup, and you provide their address(es) on the command line or to a config file, or 2) use a semi-centralized "rendezvous server" where "historic data hosts" can register as willing to host historic data to any client who needs it, and any client can perform a host lookup against that rendezvous server.

The first option is probably a subset of the second, so the first option is probably the appropriate choice for MVP.

Like live replay, live catchup is probably better implemented before data persistence. It can rely on in-memory storage to host data.

The text was updated successfully, but these errors were encountered:

mark-rushakoff · 2024-08-17T03:22:49Z

I suggested elsewhere that we refer to these as:

"state machine catchup": the process is online and the mirror is up to date, but the state machine has fallen behind. In this case, as the state machine begins processing again, the mirror has special handling indicating that it is replaying already committed blocks, and the state machine should not attempt to submit votes or propose blocks.
"mirror catchup": the process has been offline for some time, and the mirror needs to discover blocks that are earlier than the currently gossiped messages.

From recollection, we have a lot of the wiring in place for state machine catchup, and it is more a plumbing/wiring issue than it is feature work.

On the other hand, there is a lot of likely complexity in mirror catchup.

How do you know where you can fetch historic blocks? (We discussed that maybe serving historic blocks is something we would enable by default to start with, and then a mirror could just attempt to download blocks from all its peers.)
How do you know if you can trust the current gossip messages if they are many blocks ahead of your last received block? (We have had other discussions about some kind of light client style proof to show that the current validator set is trustable, but that also requires data discovery.)
If we are going to deliberately serve historic blocks, then it ought to be a proper feature, at the very least serving batches of blocks by some basic scheme like every 100 or 1000 blocks; at a minimum using snappy compression at first, later on probably using Zstandard, likely with a custom dictionary.

I like the precision of the terms "state machine catchup" and "mirror catchup". They are more clear than "live replay" and "live catchup" which are not intuitively different. But, they are coupled to some implementation details, at least one of which needs an explanation to a newcomer to the system. Maybe we can find a better word than Mirror for that outermost layer.

In any case, I'm going to start on the state machine catchup next week.

beckettsean · 2024-08-26T23:41:53Z

you have to know the address of a host who supports live catchup, and you provide their address(es) on the command line or to a config file, or 2) use a semi-centralized "rendezvous server" where "historic data hosts" can register as willing to host historic data to any client who needs it, and any client can perform a host lookup against that rendezvous server.

The first puts the burden of truth on the operator executing the catchup, which is the correct place for it to be in the initial phase. The second means the operator is now blindly trusting an external list, which is subject to hacking/spoofing in a way that manually entered addresses acquired offline are not.

Once there's a tested, validated, standardized, and socialized way to do mirror catchup, the entry points to that catchup can be part of the chain or gossip messages, so that an operator who knows one good live node can bootstrap from that to a live mirror catchup node and trust that discovery process. Updates to the mirror catchup entry points would be txns on chain or by live set validator nodes, so presumably as trustworthy as things get.

jackzampolin changed the title ~~[Epic] Live replay~~ [Epic] Live replay / catchup Aug 16, 2024

jonathanpberger changed the title ~~[Epic] Live replay / catchup~~ [EPIC] Live replay / catchup Aug 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[EPIC] Live replay / catchup #11

[EPIC] Live replay / catchup #11

jackzampolin commented Aug 16, 2024 •

edited

Loading

mark-rushakoff commented Aug 17, 2024

beckettsean commented Aug 26, 2024 •

edited

Loading

[EPIC] Live replay / catchup #11

[EPIC] Live replay / catchup #11

Comments

jackzampolin commented Aug 16, 2024 • edited Loading

mark-rushakoff commented Aug 17, 2024

beckettsean commented Aug 26, 2024 • edited Loading

jackzampolin commented Aug 16, 2024 •

edited

Loading

beckettsean commented Aug 26, 2024 •

edited

Loading