Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: clarify chain upgrade procedure #4245

Merged
merged 1 commit into from
Apr 22, 2024
Merged

docs: clarify chain upgrade procedure #4245

merged 1 commit into from
Apr 22, 2024

Conversation

conorsch
Copy link
Contributor

@conorsch conorsch commented Apr 19, 2024

Describe your changes

Follow-up to #4097. We performed a chain upgrade on Penumbra for v0.71.0. Based on feedback from the community validators, we've refined the procedure somewhat. Substantive changes to the docs are as follows:

  • backup the entire pd & cometbft data dirs, as a precaution
  • upgrade to new pd early, prior to running export, for simplicity's sake
  • only move the exported rocksdb directory, which allows preservation of other dirs in pd state, e.g. acme tls info
  • do not reset priv validator state
  • amend custom find command to preserve tx_index.db

Not yet included in these docs are specifics about the upcoming release; we'll add those soon.

Finishes and therefore closes #1812.

Issue ticket number and link

Checklist before requesting a review

  • If this code contains consensus-breaking changes, I have added the "consensus-breaking" label. Otherwise, I declare my belief that there are not consensus-breaking changes, for the following reason:

    docs-only, no code changes

@conorsch conorsch requested a review from erwanor April 19, 2024 16:01
@cratelyn cratelyn added A-docs Area: Documentation needs for the project A-upgrades Area: Relates to chain upgrades labels Apr 19, 2024
@cratelyn cratelyn added this to the Sprint 4 milestone Apr 19, 2024
Copy link
Contributor

@cratelyn cratelyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🌱 as ever, i appreciate this care for such important documentation!

&& cp ~/.penumbra/testnet_data/node0/pd-exported-state/priv_validator_state.json ~/.penumbra/testnet_data/node0/cometbft/data/priv_validator_state.json`
8. Then we clean up the old CometBFT state: `find ~/.penumbra/testnet_data/node0/cometbft/data/ -mindepth 1 -maxdepth 1 -type d -exec rm -r {} +`
6. Move the migrated state into place: `rm -r ~/.penumbra/testnet_data/node0/pd/rocksdb && mv ~/.penumbra/testnet_data/node0/pd-exported-state/rocksdb ~/.penumbra/testnet_data/node0/pd/`
7. Move the new cometbft genesis into place: `cp ~/.penumbra/testnet_data/node0/pd-exported-state/genesis.json ~/.penumbra/testnet_data/node0/cometbft/config/genesis.json`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You must reset the comet WAL, blockstore, and evidence DB

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comet won't start if you don't do the WAL/blockstore, and it will corrupt your state if you don't do the evidence db if i recall correctly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining. Substantively, that means we should recommend either:

  1. find ~/.penumbra/testnet_data/node0/cometbft/data/ -mindepth 1 -maxdepth 1 -type d -exec rm -r {} +, as before; or
  2. cometbft --home ~/.penumbra/testnet_data/node0/cometbft/ reset-state

AIUI, the two are functionally equivalent. If so, I'd prefer to recommend the latter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think reset-state is what we want, I can test it later today

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should check that it doesn't delete the indexing KV, will check

Copy link
Contributor Author

@conorsch conorsch Apr 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See testing report below, @erwanor. I believe the revised docs are what we need, but would certainly appreciate a +1 from you!

@conorsch
Copy link
Contributor Author

Following up on review comments in #4245 (comment), I did some local testing.

Findings

  • the manual find invocation is functionally equivalent to the cometbft reset-state task
  • priv_validator_state.json is not touched by either method

Methodology

I grabbed a state dir from a preexisting node, backed it up via tar (so testing was replicable), and re-extracted it to multiple locations: 1) the pristine state, untouched, for a control group; 2) a target for the find/rm method; and 3) a target for the cometbt reset-state method. Then I compare the results via diffoscope and sha256sum.

terminal session showing comparisons
❯ ll
drwxr-xr-x - conor 12 Apr 15:21 node0-cmt-reset
drwxr-xr-x - conor 12 Apr 15:21 node0-find-rm
drwxr-xr-x - conor 12 Apr 15:21 node0-pristine

❯ cometbft --home node0-cmt-reset/cometbft/ reset-state
I[2024-04-19|12:30:08.670] deprecated usage found in configuration file usage="[fastsync] table detected. This section has been renamed to [blocksync]. The values in this d
eprecated section will be disregarded."
I[2024-04-19|12:30:08.670] deprecated usage found in configuration file usage="fast_sync key detected. This key has been renamed to block_sync. The value of this deprecated
 key will be disregarded."
I[2024-04-19|12:30:08.670] deprecated usage found in configuration file usage="unused and deprecated upnp field detected in P2P config."
I[2024-04-19|12:30:08.670] deprecated usage found in configuration file module=main usage="[fastsync] table detected. This section has been renamed to [blocksync]. The valu
es in this deprecated section will be disregarded."
I[2024-04-19|12:30:08.670] deprecated usage found in configuration file module=main usage="fast_sync key detected. This key has been renamed to block_sync. The value of thi
s deprecated key will be disregarded."
I[2024-04-19|12:30:08.670] deprecated usage found in configuration file module=main usage="unused and deprecated upnp field detected in P2P config."
I[2024-04-19|12:30:08.671] Removed all blockstore.db                    module=main dir=node0-cmt-reset/cometbft/data/blockstore.db
I[2024-04-19|12:30:08.671] Removed all state.db                         module=main dir=node0-cmt-reset/cometbft/data/state.db
I[2024-04-19|12:30:08.671] Removed all cs.wal                           module=main dir=node0-cmt-reset/cometbft/data/cs.wal
I[2024-04-19|12:30:08.671] Removed all evidence.db                      module=main dir=node0-cmt-reset/cometbft/data/evidence.db
I[2024-04-19|12:30:08.671] Removed tx_index.db                          module=main dir=node0-cmt-reset/cometbft/data/tx_index.db

❯ find node0-find-rm/cometbft/data/ -mindepth 1 -maxdepth 1 -type d -exec rm -r {} +

❯ diffoscope --exclude-directory-metadata=recursive node0-pristine/ node0-cmt-reset/
 |######################################################################################################################|  100%                             Time:  0:00:00
--- node0-pristine/
+++ node0-cmt-reset/
│   --- node0-pristine/cometbft
├── +++ node0-cmt-reset/cometbft
│ │   --- node0-pristine/cometbft/data
│ ├── +++ node0-cmt-reset/cometbft/data
│ │ ├── file list
│ │ │ @@ -1,6 +1 @@
│ │ │ -blockstore.db
│ │ │ -cs.wal
│ │ │ -evidence.db
│ │ │ -priv_validator_state.json
│ │ │ -state.db
│ │ │ -tx_index.db
│ │ │ +priv_validator_state.json

❯ sha256sum node0-{pristine,cmt-reset}/cometbft/data/priv_validator_state.json
9b1a92054f0af837d86f10f317c0fd002345cdef21a96e176ebe05656f5193e1  node0-pristine/cometbft/data/priv_validator_state.json
9b1a92054f0af837d86f10f317c0fd002345cdef21a96e176ebe05656f5193e1  node0-cmt-reset/cometbft/data/priv_validator_state.json

❯ diffoscope --exclude-directory-metadata=recursive node0-cmt-reset/ node0-find-rm/
 |######################################################################################################################|  100%                             Time:  0:00:00

❯ sha256sum node0-{pristine,cmt-reset,find-rm}/cometbft/data/priv_validator_state.json
9b1a92054f0af837d86f10f317c0fd002345cdef21a96e176ebe05656f5193e1  node0-pristine/cometbft/data/priv_validator_state.json
9b1a92054f0af837d86f10f317c0fd002345cdef21a96e176ebe05656f5193e1  node0-cmt-reset/cometbft/data/priv_validator_state.json
9b1a92054f0af837d86f10f317c0fd002345cdef21a96e176ebe05656f5193e1  node0-find-rm/cometbft/data/priv_validator_state.json

Based on that info, I'm fine recommending the cometbft state-reset method. Another implication here is that we should probably remove priv_validator_state.json from the migration and join methods. I'll do that in a follow-up PR.

@erwanor
Copy link
Member

erwanor commented Apr 22, 2024

@conorsch I[2024-04-19|12:30:08.671] Removed tx_index.db module=main dir=node0-cmt-reset/cometbft/data/tx_index.db it looks like this remove the tx indexer, so maybe to fallback onto the ugly find... expr?

@conorsch
Copy link
Contributor Author

It looks like this remove the tx indexer, so maybe to fallback onto the ugly find... expr?

For clarity's sake, that's not a regression: the original find command was also nuking the tx_index.db directory. I've switched back to using find rather than cometbft reset-state, this time amending the find command to exclude the tx index db dir. Sounds like that's what we want for this upgrade. Thank you for your careful read! 🙏

Follow-up to #4097. We performed a chain upgrade on Penumbra
for v0.71.0. Based on feedback from the community validators, we've
refined the procedure somewhat. Substantive changes to the docs are as
follows:

  * backup the entire pd & cometbft data dirs, as a precaution
  * upgrade to new pd early, prior to running export, for simplicity's
    sake
  * only move the exported rocksdb directory, which allows preservation
    of other dirs in pd state, e.g. acme tls info
  * do not reset priv validator state
  * amend custom find command to preserve `tx_index.db`

Not yet included in these docs are specifics about the upcoming release;
we'll add those soon.

Finishes and therefore closes #1812.
@cratelyn cratelyn modified the milestones: Sprint 4, Sprint 5 Apr 22, 2024
@conorsch conorsch requested a review from erwanor April 22, 2024 20:04
@conorsch conorsch dismissed erwanor’s stale review April 22, 2024 20:11

comments addressed

@conorsch conorsch merged commit ac3217b into main Apr 22, 2024
8 checks passed
@conorsch conorsch deleted the 1812-upgrade-docs branch April 22, 2024 20:11
conorsch added a commit that referenced this pull request Apr 24, 2024
We recently removed mention of zeroing-out `priv_validator_state.json`
for CometBFT as part of the chain upgrade docs. During migration testing
on a private fork today, though, several of us observed that without
this change, the network would not resume. More research required to
understand the root cause here.

Refs #4245, #4239.
conorsch added a commit that referenced this pull request Apr 25, 2024
We recently removed mention of zeroing-out `priv_validator_state.json`
for CometBFT as part of the chain upgrade docs. During migration testing
on a private fork today, though, several of us observed that without
this change, the network would not resume. More research required to
understand the root cause here.

Refs #4245, #4239.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-docs Area: Documentation needs for the project A-upgrades Area: Relates to chain upgrades
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

upgrades: create a template guide
3 participants