-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docs for Network migrations and Statesync #102
Conversation
e6ed3e5
to
6bff7cd
Compare
24e82ad
to
fd912bf
Compare
docs/node/network-migration.mdx
Outdated
|
||
### Create Snapshot | ||
|
||
Stop the network by shutting down all the nodes in the network. Then, use the `kwil-admin snapshot` tool to take a snapshot of the final state of the Kwild database. For more details, refer to the [create database snapshots](/docs//ref/kwil-admin/snapshot/create) documentation. This tool connects directly to the database to capture its state and does not require the `kwild` process to be running. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do they have to stop nodes? I don't think they have to stop all of them, but do they need to stop the one they are snapshotting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we are doing a network migration here. I asked Jon to have a feature in Consensus Upgrades which will stop the network from mining blocks, that way, we can do coordinated stops and can get the snapshot of the latest state. and start a new network without losing any more transactions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I think there might be a better way for us to handle this that can allow for them to have less downtime. Will share in a bit, but the tldr; is using oracles to listen to the old network.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious about the proposed solution. If I were coordinating the death and rebirth of a network, I would expect to have all validators in on it. Therefore, the last validator to hit CTRL+C gets to make the genesis data file. Regardless, a pre-programmed halting height is a minor improvement that I'd support.
However, I still don't like we have to support this kind of migration. So, I hope that whatever oracle solution we devise isn't complex, otherwise we might as well just not brick networks with incompatible upgrades in the first place. The need to keep the old network running and accessible by clients is undesirable complexity.
Is there precedent for launching a new network that is linked to the old one, but with new rules, where the intent is to migrate all use to the new one?
fd912bf
to
0befd0c
Compare
-S, --silence Silence logs | ||
``` | ||
|
||
### SEE ALSO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the category "index.md" page, we can use react cards like on https://docs.kwil.com/docs/admin.
I think that should work if the hierarchy is established correctly.
# starting from the height of the snapshot. | ||
enable = true | ||
|
||
# Path to the directory where the received snapshot is stored |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If relative, it will be under the root directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its relative to the current working directory where kwild is run
84d01ac
to
3dda41f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly grammar/typo fixes. A few questions that may inform additional content.
Also, please see the comment on auto-generating docs for the kwil-admin reference.
|
||
To support statesync, each network should have atleast two trusted snapshot providers that are responsible for creating, distributing and validating snapshots. These trusted snapshot providers should have [snapshot creation](/docs/daemon/config/settings#appsnapshots) enabled and these provider's chain P2P and RPC endpoints should be accessible to the other nodes in the network to respond to the snapshot discovery and validation requests from the new nodes. | ||
|
||
Along with the trusted snapshot providers, other nodes in the network can also enable snapshots and distribute them to the joining nodes during the statesync process. However, a joining node only accepts these snapshots after validating them with the trusted snapshot providers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this something that cometBFT does programatically, or is it a recommended best practice? If programatically, how do you set the "trusted provider"?
docs/node/statesync.mdx
Outdated
When state sync is enabled, the node first discovers snapshots from all its connected peers. It then selects the latest snapshot from those discovered and validates the integrity of the snapshot with the trusted snapshot provider. Once a valid snapshot is identified, the node fetches the snapshot chunks from the peers and restores the database state using these chunks. The node then begins syncing blocks starting from the snapshot height. | ||
|
||
:::note | ||
The node will stay in the discovery phase until a snapshot is discovered and validated. If there are no snapshots in the network, the node will be stuck in the discovery phase. To make progess, restart the node by disabling statesync to switch to using blocksync. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The node will stay in the discovery phase until a snapshot is discovered and validated. If there are no snapshots in the network, the node will be stuck in the discovery phase. To make progess, restart the node by disabling statesync to switch to using blocksync. | |
The node will stay in the discovery phase until a snapshot is discovered and validated. If there are no snapshots in the network, the node will be stuck in the discovery phase. To make progess, disable statesync and restart the node. The node will then progress with blocksync. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any docs we can hyperlink to on blocksync?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Umm, I can add one. But there is not much to talk about it. I will probably add one at the top of this file.
686886d
to
7eef3a0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor change above, otherwise LGTM.
@brennanjl, do you want to review once more before approving?
No description provided.