-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[EPIC] Live replay / catchup #11
Comments
I suggested elsewhere that we refer to these as:
From recollection, we have a lot of the wiring in place for state machine catchup, and it is more a plumbing/wiring issue than it is feature work. On the other hand, there is a lot of likely complexity in mirror catchup.
I like the precision of the terms "state machine catchup" and "mirror catchup". They are more clear than "live replay" and "live catchup" which are not intuitively different. But, they are coupled to some implementation details, at least one of which needs an explanation to a newcomer to the system. Maybe we can find a better word than Mirror for that outermost layer. In any case, I'm going to start on the state machine catchup next week. |
The first puts the burden of truth on the operator executing the catchup, which is the correct place for it to be in the initial phase. The second means the operator is now blindly trusting an external list, which is subject to hacking/spoofing in a way that manually entered addresses acquired offline are not. Once there's a tested, validated, standardized, and socialized way to do mirror catchup, the entry points to that catchup can be part of the chain or gossip messages, so that an operator who knows one good live node can bootstrap from that to a live mirror catchup node and trust that discovery process. Updates to the mirror catchup entry points would be txns on chain or by live set validator nodes, so presumably as trustworthy as things get. |
The Mirror layer, sitting in front of the state machine, is already capable of tracking the network while the state machine is unresponsive or even absent.
Now we need to implement the path of the mirror replaying "old" (older than the most recently committed block) requests to the state machine.
This takes care of two primary usecases:
n
blocks and needs to sync those block to rejoin consensus (live replay)Live catchup is akin to block sync in Comet terms. Starting from genesis or some previously saved state, the Mirror is many blocks behind the rest of the network, and simply observing the current consensus messages is insufficient to reach the current height.
We do not currently have an RPC or HTTP endpoint to serve old block headers nor those blocks' data (i.e. the transactions). There is a design decision to make about serving individual historical blocks, versus perhaps paging and compressing 100 or 1000 blocks at a time. This could mean, for example, at Height 750, heights 700-749 are retrievable individually, but you can only get the collections for 1-99, 100-199, etc.
Beyond the mechanism of transmitting historic block data, there is a secondary issue of host discovery. The two obvious options are 1) you have to know the address of a host who supports live catchup, and you provide their address(es) on the command line or to a config file, or 2) use a semi-centralized "rendezvous server" where "historic data hosts" can register as willing to host historic data to any client who needs it, and any client can perform a host lookup against that rendezvous server.
The first option is probably a subset of the second, so the first option is probably the appropriate choice for MVP.
Like live replay, live catchup is probably better implemented before data persistence. It can rely on in-memory storage to host data.
The text was updated successfully, but these errors were encountered: