Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(pd): support archives for migrate and join #4055

Merged
merged 1 commit into from
Mar 23, 2024

Conversation

conorsch
Copy link
Contributor

@conorsch conorsch commented Mar 20, 2024

Enables opt-in archive generation when performing:

  • pd export
  • pd migrate

The goal is to provide a standardized bottling-up of pd state, specifically the rocksdb directory. In the context of upgrades, only the "pd migrate" functionality change is what we care about: we want the archived dir to contain both rocksdb data and the modified genesis file.

Accordingly, pd testnet join is modified to support an optional archive URL. If set, the remote tar.gz archive will be downloaded and extracted, clobbering the cometbft config. A remote bootstrap node is still contacted, to learn about peers, otherwise the newly created node wouldn't be able to talk to the network.

@conorsch conorsch requested a review from erwanor March 20, 2024 01:56
@conorsch conorsch force-pushed the 3841-support-archive-urls-in-pd branch from 26b40c8 to e74d429 Compare March 20, 2024 02:09
@cratelyn cratelyn added this to the Sprint 2 milestone Mar 20, 2024
@conorsch conorsch force-pushed the 3841-support-archive-urls-in-pd branch from e74d429 to 83ae671 Compare March 20, 2024 18:04
@conorsch conorsch changed the title Support archive URLs for loading pd state feat(pd): support archives for migrate and join Mar 20, 2024
@erwanor
Copy link
Member

erwanor commented Mar 20, 2024

I think what we want here is to give the ability for pd join to specify an archive url or path, and have the pd cli logic "install" that archive in the correct data directories. We could have pd export generate an archive, but I don't think we need that option for pd migrate?

@conorsch
Copy link
Contributor Author

I think what we want here is to give the ability for pd join to specify an archive url or path, and have the pd cli logic "install" that archive in the correct data directories.

This is implemented! (I recently force-pushed, so the code just changed.) What's not yet implemented in clobbering of the genesis file. In my local testing, pd migrate throws an error, so I cannot generate a modified genesis file and test the round-trip of export -> migrate -> host -> join.

We could have pd export generate an archive

That's implemented and working. I don't think it really helps in the upgrade case, but it's easy enough to add the flag, since it reuses a function. What it does leave the door open to is creating archives of node state for fast-sync, so a node can bootstrap itself from height n if an archive URL is provided.

I don't think we need that option for pd migrate?

I think we do! It's pd migrate that emits a directory containing 1) modified rocksdb state and 2) modified genesis file. Those two artifacts should be packed up in an archive, and that archive is what should be hosted for new-joining nodes to consume via pd testnet join --archive-url <URL>. Crucially, the new-joining node will still contact a remote node and fetch its peer info, which I think is a sane decision.

@conorsch conorsch added the A-upgrades Area: Relates to chain upgrades label Mar 20, 2024
@conorsch conorsch force-pushed the 3841-support-archive-urls-in-pd branch from 83ae671 to 27a7a42 Compare March 21, 2024 15:44
@conorsch conorsch force-pushed the 3841-support-archive-urls-in-pd branch from 27a7a42 to 44296e5 Compare March 21, 2024 18:51
conorsch added a commit that referenced this pull request Mar 22, 2024
While testing upgrades as part of [0], updating the docs in [1],
it became clear that during manual maintenance for munging state during
chain upgrades procedures, we want all the nodes to be reset together.
Therefore if maintenanceMode=true, we'll also disable the
readinessProbes, otherwise the rollout will be staged, forcing us to
upgrade each node in term, rather than being able to parallelize them.
Now, we can `pd export` and backup state, then update the statefulset to bump
the container version (to the post-upgrade version), and restart it in
maintenance mode to continue with the `pd migrate` step and resulting
copying around of emitted files.

[0] #4055
[1] https://github.com/penumbra-zone/penumbra/wiki/Performing-upgrades
@conorsch conorsch force-pushed the 3841-support-archive-urls-in-pd branch from 44296e5 to f67a5da Compare March 22, 2024 01:24
conorsch added a commit that referenced this pull request Mar 22, 2024
While testing upgrades as part of [0], updating the docs in [1],
it became clear that during manual maintenance for munging state during
chain upgrades procedures, we want all the nodes to be reset together.
Therefore if maintenanceMode=true, we'll also disable the
readinessProbes, otherwise the rollout will be staged, forcing us to
upgrade each node in term, rather than being able to parallelize them.
Now, we can `pd export` and backup state, then update the statefulset to bump
the container version (to the post-upgrade version), and restart it in
maintenance mode to continue with the `pd migrate` step and resulting
copying around of emitted files.

[0] #4055
[1] https://github.com/penumbra-zone/penumbra/wiki/Performing-upgrades
@conorsch conorsch force-pushed the 3841-support-archive-urls-in-pd branch from f67a5da to caf0f9c Compare March 22, 2024 16:37
@conorsch conorsch force-pushed the 3841-support-archive-urls-in-pd branch from caf0f9c to c0dc52a Compare March 22, 2024 20:38
@conorsch conorsch marked this pull request as ready for review March 22, 2024 20:42
Enables opt-in archive generation when performing:

  * pd export
  * pd migrate

The goal is to provide a standardized bottling-up of pd state,
specifically the rocksdb directory. In the context of upgrades,
the `pd migrate` functionality is what's important:
we want the archived dir to contain both rocksdb data and the modified
genesis file and updated validator state.

Accordingly, `pd testnet join` is modified to support an optional
archive URL. If set, the remote tar.gz archive will be downloaded
and extracted, clobbering the cometbft config files that were fetched.
A remote bootstrap node is still contacted, to learn about peers,
otherwise the newly created node wouldn't be able to talk to the network.
@conorsch conorsch force-pushed the 3841-support-archive-urls-in-pd branch from c0dc52a to ea77d32 Compare March 23, 2024 20:46
@conorsch
Copy link
Contributor Author

Merging optimistically to continue with upgrade testing, ahead of #4087.

@conorsch conorsch merged commit c59ed53 into main Mar 23, 2024
7 checks passed
@conorsch conorsch deleted the 3841-support-archive-urls-in-pd branch March 23, 2024 21:18
conorsch added a commit that referenced this pull request Mar 24, 2024
Follow-up to [0]. We've tested several times the creation and use of
archive urls via cli, but hadn't yet added logic to deploy new nodes
from archive URLs, as well. This change worked for adding a new-joining
node to a bespoke devnet post-upgrade.

[0] #4055
erwanor pushed a commit that referenced this pull request Mar 25, 2024
Follow-up to [0]. We've tested several times the creation and use of
archive urls via cli, but hadn't yet added logic to deploy new nodes
from archive URLs, as well. This change worked for adding a new-joining
node to a bespoke devnet post-upgrade.

[0] #4055

Co-authored-by: Conor Schaefer <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-upgrades Area: Relates to chain upgrades
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants