Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs for Network migrations and Statesync #102

Merged
merged 4 commits into from
May 31, 2024
Merged

Docs for Network migrations and Statesync #102

merged 4 commits into from
May 31, 2024

Conversation

charithabandi
Copy link
Contributor

No description provided.

@charithabandi charithabandi force-pushed the docs-snapshots branch 2 times, most recently from e6ed3e5 to 6bff7cd Compare May 3, 2024 17:45
@charithabandi charithabandi marked this pull request as ready for review May 3, 2024 17:47
@charithabandi charithabandi force-pushed the docs-snapshots branch 2 times, most recently from 24e82ad to fd912bf Compare May 3, 2024 17:54
docs/node/network-migration.mdx Outdated Show resolved Hide resolved

### Create Snapshot

Stop the network by shutting down all the nodes in the network. Then, use the `kwil-admin snapshot` tool to take a snapshot of the final state of the Kwild database. For more details, refer to the [create database snapshots](/docs//ref/kwil-admin/snapshot/create) documentation. This tool connects directly to the database to capture its state and does not require the `kwild` process to be running.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do they have to stop nodes? I don't think they have to stop all of them, but do they need to stop the one they are snapshotting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we are doing a network migration here. I asked Jon to have a feature in Consensus Upgrades which will stop the network from mining blocks, that way, we can do coordinated stops and can get the snapshot of the latest state. and start a new network without losing any more transactions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I think there might be a better way for us to handle this that can allow for them to have less downtime. Will share in a bit, but the tldr; is using oracles to listen to the old network.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious about the proposed solution. If I were coordinating the death and rebirth of a network, I would expect to have all validators in on it. Therefore, the last validator to hit CTRL+C gets to make the genesis data file. Regardless, a pre-programmed halting height is a minor improvement that I'd support.

However, I still don't like we have to support this kind of migration. So, I hope that whatever oracle solution we devise isn't complex, otherwise we might as well just not brick networks with incompatible upgrades in the first place. The need to keep the old network running and accessible by clients is undesirable complexity.

Is there precedent for launching a new network that is linked to the old one, but with new rules, where the intent is to migrate all use to the new one?

docs/node/network-migration.mdx Outdated Show resolved Hide resolved
docs/node/network-migration.mdx Outdated Show resolved Hide resolved
docs/node/network-migration.mdx Outdated Show resolved Hide resolved
docs/node/statesync.mdx Outdated Show resolved Hide resolved
docs/node/statesync.mdx Outdated Show resolved Hide resolved
@jchappelow jchappelow self-requested a review May 13, 2024 19:50
docs/node/admin/kwil-admin/setup/testnet.mdx Outdated Show resolved Hide resolved
docs/node/admin/kwil-admin/snapshot/create.mdx Outdated Show resolved Hide resolved
docs/node/admin/kwil-admin/snapshot/create.mdx Outdated Show resolved Hide resolved
docs/node/admin/kwil-admin/snapshot/create.mdx Outdated Show resolved Hide resolved
-S, --silence Silence logs
```

### SEE ALSO
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the category "index.md" page, we can use react cards like on https://docs.kwil.com/docs/admin.

image

I think that should work if the hierarchy is established correctly.

docs/node/daemon/config/settings.mdx Outdated Show resolved Hide resolved
docs/node/daemon/config/settings.mdx Outdated Show resolved Hide resolved
# starting from the height of the snapshot.
enable = true

# Path to the directory where the received snapshot is stored
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If relative, it will be under the root directory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its relative to the current working directory where kwild is run

docs/node/daemon/config/settings.mdx Outdated Show resolved Hide resolved
docs/node/daemon/config/settings.mdx Outdated Show resolved Hide resolved
Copy link
Contributor

@KwilLuke KwilLuke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly grammar/typo fixes. A few questions that may inform additional content.

Also, please see the comment on auto-generating docs for the kwil-admin reference.

docs/node/admin/kwil-admin/snapshot/index.mdx Outdated Show resolved Hide resolved
docs/node/network-migration.mdx Outdated Show resolved Hide resolved
docs/node/network-migration.mdx Outdated Show resolved Hide resolved
docs/node/network-migration.mdx Outdated Show resolved Hide resolved
docs/node/network-migration.mdx Outdated Show resolved Hide resolved
docs/node/statesync.mdx Outdated Show resolved Hide resolved

To support statesync, each network should have atleast two trusted snapshot providers that are responsible for creating, distributing and validating snapshots. These trusted snapshot providers should have [snapshot creation](/docs/daemon/config/settings#appsnapshots) enabled and these provider's chain P2P and RPC endpoints should be accessible to the other nodes in the network to respond to the snapshot discovery and validation requests from the new nodes.

Along with the trusted snapshot providers, other nodes in the network can also enable snapshots and distribute them to the joining nodes during the statesync process. However, a joining node only accepts these snapshots after validating them with the trusted snapshot providers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something that cometBFT does programatically, or is it a recommended best practice? If programatically, how do you set the "trusted provider"?

docs/node/statesync.mdx Outdated Show resolved Hide resolved
docs/node/statesync.mdx Outdated Show resolved Hide resolved
When state sync is enabled, the node first discovers snapshots from all its connected peers. It then selects the latest snapshot from those discovered and validates the integrity of the snapshot with the trusted snapshot provider. Once a valid snapshot is identified, the node fetches the snapshot chunks from the peers and restores the database state using these chunks. The node then begins syncing blocks starting from the snapshot height.

:::note
The node will stay in the discovery phase until a snapshot is discovered and validated. If there are no snapshots in the network, the node will be stuck in the discovery phase. To make progess, restart the node by disabling statesync to switch to using blocksync.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The node will stay in the discovery phase until a snapshot is discovered and validated. If there are no snapshots in the network, the node will be stuck in the discovery phase. To make progess, restart the node by disabling statesync to switch to using blocksync.
The node will stay in the discovery phase until a snapshot is discovered and validated. If there are no snapshots in the network, the node will be stuck in the discovery phase. To make progess, disable statesync and restart the node. The node will then progress with blocksync.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any docs we can hyperlink to on blocksync?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Umm, I can add one. But there is not much to talk about it. I will probably add one at the top of this file.

Copy link
Contributor

@KwilLuke KwilLuke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor change above, otherwise LGTM.

@brennanjl, do you want to review once more before approving?

@KwilLuke KwilLuke merged commit 0857af0 into main May 31, 2024
@KwilLuke KwilLuke deleted the docs-snapshots branch May 31, 2024 22:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants