Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: chain upgrade procedure, for operators #4097

Merged
merged 1 commit into from
Mar 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/guide/src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
- [Installing `pd`](./pd/install.md)
- [Joining a testnet](./pd/join-testnet.md)
- [Becoming a validator](./pd/validator.md)
- [Performing a chain upgrade](./pd/chain-upgrade.md)
- [Debugging](./pd/debugging.md)
- [Local RPC with `pclientd`](./pclientd.md)
- [Configuring `pclientd`](./pclientd/configure.md)
Expand Down
73 changes: 73 additions & 0 deletions docs/guide/src/pd/chain-upgrade.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# Performing chain upgrades

When consensus-breaking changes are made to the Penumbra protocol,
node operators must coordinate upgrading to the new version of the software
at the same time. Penumbra uses a governance proposal for scheduling upgrades
at a specific block height.

## Upgrade process abstractly

At a high level, the upgrade process consists of the following steps:

1. Governance proposal submitted, specifying explicit chain height `n` for halt to occur.
2. Governance proposal passes.
3. Chain reaches specified height `n-1`, nodes stop generating blocks.
4. Manual upgrade is performed on each validator and fullnode:
1. Prepare migration directory via `pd export`.
2. Install the new version of pd.
3. Apply changes to node state via `pd migrate`.
4. Copy a few files and directories around, clean up CometBFT state.
5. Restart node.

After the node is restarted on the new version, it should be able to talk to the network again.
Once enough validators with sufficient stake weight have upgraded, the network
will resume generating blocks.


## Genesis time

In order for the chain to start again after the upgrade, all nodes must be using the same genesis information,
including the timestamp for the genesis event. While the `pd migrate` command will create a new `genesis.json` file,
it cannot know the correct genesis start time to use without the operator supplying the `--genesis-start` flag.
The community may choose to specify a start time within the upgrade proposal. If so, all operators must use that value
when performing the migration, as described below. Otherwise, validators must coordinate out of band to agree
on a genesis start time.

Leveraging the governance proposal is the recommended way to solve this problem. If the genesis start time is a value
in the future, then after the upgrade is performed, the node will start, but not process blocks. It will wait
until the `--genesis-start` time is reached, at which point it will resume processing blocks. In this way,
the community of validators can coordinate resumption of chain activity, even when operators perform migrate their ndoes
at slightly different times.

## Performing a chain upgrade

The following steps assume that `pd` is using the default home directory of `~/.penumbra/testnet_data/node0/pd`.
If your instance is using a different directory, update the paths accordingly.

1. Stop both `pd` and `cometbft`. Depending on how you run Penumbra, this could mean `sudo systemctl stop penumbra cometbft`.
2. Using the same version of `pd` that was running when the chain halted, prepare an export directory:
`pd export --home ~/.penumbra/testnet_data/node0/pd --export-directory ~/.penumbra/testnet_data/node0/pd-exported-state`
3. Back up the historical state directory: `mv ~/.penumbra/testnet_data/node0/pd ~/.penumbra/testnet_data/node0/pd-state-backup`
4. Download the latest version of `pd` and install it. Run `pd --version` and confirm you see `{{ #include ../penumbra_version.md }}` before proceeding.

<!--
An example log message emitted by `pd migrate` without providing `--genesis-start`:

pd::upgrade: no genesis time provided, detecting a testing setup now=2023-12-09T00:08:24.225277473Z`

The value after `now=` is what should be copied. In practice, for testnets, Penumbra Labs will advise on a genesis time
and provide that value in the documentation. Or should we just pick a genesis start ahead of time, and use that for all?
-->
5. Apply the migration: `pd migrate --genesis-start "GENESIS_TIME" --target-directory ~/.penumbra/testnet_net/node0/pd-exported-state/ --migrate-archive ~/.penumbra/testnet_data/node0/pd-migrated-state-{{ #include ../penumbra_version.md }}.tar.gz`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would add a section about genesis time and explain that this is decided by validators/node runners, possibly via the governance proposal itself (non-binding, opt-in). And describe how genesis time works: if it is set in the future, the node will start and wait for the time of genesis to be reached before it can start producing blocks (and peering?).

Replace `GENESIS_TIME` with the exact string: `XXXXX`.
6. Move the migrated state into place: `mkdir ~/.penumbra/testnet_data/node0/pd && mv ~/.penumbra/testnet_data/node0/pd-exported-state/rocksdb ~/.penumbra/testnet_data/node0/pd/`
7. Move the upgrade cometbft state into place: `cp ~/.penumbra/testnet_data/node0/pd-exported-state/genesis.json ~/.penumbra/testnet_data/node0/cometbft/config/genesis.json
&& cp ~/.penumbra/testnet_data/pd-exported-state/priv_validator_state.json ~/.penumbra/testnet_data/node0/cometbft/data/priv_validator_state.json`
8. Then we clean up the old CometBFT state: `find ~/.penumbra/testnet_data/node0/cometbft/data/ -mindepth 1 -maxdepth 1 -type d -exec rm -r {} +`

Finally, restart the node, e.g. `sudo systemctl restart penumbra cometbft`. Check the logs, and you should see the chain progressing
past the halt height `n`.

If you want to host a snapshot for this migration, copy the file
`~/.penumbra/testnet_data/node0/pd-migrated-state-{{ #include ../penumbra_version.md }}.tar.gz` to the appropriate hosting environment,
and inform the users of your validator.
23 changes: 23 additions & 0 deletions docs/guide/src/pd/join-testnet.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,13 +30,36 @@ This will delete the entire testnet data directory.

Next, generate a set of configs for the current testnet:

<!--
### Begin join customization

The following section describes how to join a testnet chain *which has never upgraded*.
Once a chain upgrade occurs, a new-joining node must have access to an archive
of historical, migrated state. When we upgrade the chain, we should update these
docs to switch to the archive-url version:

```shell
pd testnet join --external-address IP_ADDRESS:26656 --moniker MY_NODE_NAME \
--archive-url "https://snapshots.penumbra.zone/testnet/pd-archived-stated-xxxxx.tar.gz
```

where `IP_ADDRESS` (like `1.2.3.4`) is the public IP address of the node you're running,
and `MY_NODE_NAME` is a moniker identifying your node. Other peers will try to connect
to your node over port `26656/TCP`. Finally, the `--archive-url` flag will fetch
a tarball of historical blocks, so that your newly joining node can understand transactions
that occurred prior to the most recent chain upgrade.
-->

```shell
pd testnet join --external-address IP_ADDRESS:26656 --moniker MY_NODE_NAME
```

where `IP_ADDRESS` (like `1.2.3.4`) is the public IP address of the node you're running,
and `MY_NODE_NAME` is a moniker identifying your node. Other peers will try to connect
to your node over port `26656/TCP`.
<!--
### End join customization
-->

If your node is behind a firewall or not publicly routable for some other reason,
skip the `--external-address` flag, so that other peers won't try to connect to it.
Expand Down
Loading