Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: separate deploy logic for networks and nodes #2983

Closed
15 of 17 tasks
conorsch opened this issue Sep 8, 2023 · 1 comment
Closed
15 of 17 tasks

ci: separate deploy logic for networks and nodes #2983

conorsch opened this issue Sep 8, 2023 · 1 comment
Assignees

Comments

@conorsch
Copy link
Contributor

conorsch commented Sep 8, 2023

Is your feature request related to a problem? Please describe.
Our CI logic currently bundles all deploy-related tasks for network provisioning and node deployment into a single action. There are actually subtle distinctions that we should manage separately:

  • create new network from fresh chain id (happens on preview deploys & testnet deploys)
  • spin up validators based on pd testnet generate output (currently the ci.sh clobbers this data)
  • join fullnodes to an existing network of validators (should be a repeatable action)
  • handle metrics per deployment

Describe the solution you'd like
Separating out these actions will enable us to manage longer-lived deployments with more confidence—for example, to support upgrade testing as described in #1804—as well as ease the deployment of ad-hoc networks to test-drive new functionality. In the past, we've done this manually, but there's no reason we can't have a point-and-click CI workflow to do it. Handling this problem would also resolve #1783, and as a side-effect, make recovery of a failed deployment possible.

Describe alternatives you've considered
We could treat the existing deployments as "good enough", but that will likely pose problems with upgrade testing.

Additional context
Three logical charts jump out at me:

  • penumbra-network (essentially wrapping pd testnet generate and spinning up initial validators)
  • penumbra-node (essentially wrapping pd testnet join and spinning up full nodes against an existing network)
  • penumbra-metrics (long-lived deployments to scrape pd & tm metrics endpoints)

We can ignore the provisioning logic for helper services like the bots and relayer for now.

Relevant tickets
The following should be resolved by the rewrite:

Progress checklist

For tracking follow-up tasks toward completion.

  • rename OG chart to "penumbra-og" for reference
  • paste in external charts
  • test drive a devnet deployment
  • delete OG chart
  • update ci.sh deploy logic
  • parity with preview & testnet envs
  • final pass for TODOs
  • generate ips for all envs
    • devnet
    • preview
    • testnet
  • merge into main
  • test with preview rebuild
  • test patch release behavior (chain state preservation)
@conorsch conorsch changed the title Separate deploy logic for networks and nodes ci: separate deploy logic for networks and nodes Sep 14, 2023
@conorsch conorsch self-assigned this Sep 14, 2023
@conorsch
Copy link
Contributor Author

This is done, except for

generate ips for all envs

which I'll do as part of the teardown/release process on Monday for #3046 .

conorsch added a commit that referenced this issue Sep 25, 2023
Removes reserved IPv4 addresses that are no longer used.
For HTTPS services, we now use a single entry IP to a Traefik daemonset
to handle traffic for all the various endpoints [0].

Regenerates the public IPs for the P2P services and commits them to
version control. We do this as part of release prep for Testnet 61 [1],
building on the deploy overhaul described in [2].

[0] #2341
[1] #3046
[2] #2983
conorsch added a commit that referenced this issue Sep 25, 2023
Makes changes encountered while deploying Testnet 61 on the new deploy
logic for the first time:

  * fixes a YAML whitespace error on the testnet external IPs
  * make sure that strategy=recreate for metrics, otherwise
    config changes may encountered a failed concurrent bind on the pvc
  * also clean up jobs, which can get stuck if there are pvc errors

Made these changes locally and ran the deploy logic from my workstation
to finalize the Testnet 61 setup. Couldn't use the GHA in this scenario,
because of a chicken-or-egg problem: we need the change in the .0 tag,
but that tag was already pushed; we can't use a .1 tag because that only
modifies an existing deployment.

Refs #2983, #3046.
conorsch added a commit that referenced this issue Oct 12, 2023
Dusted off the compose setup and updated it to use an initcontainer,
same as with the recent overhaul of deploy logic (#2983). This change
also removes the requirement for the host machine to use `pd` to
bootstrap the config: now, docker-compose is all that's required.
The goal is to make the Penumbra containers easier to work with,
for example for the block explorer push.
conorsch added a commit that referenced this issue Oct 12, 2023
Dusted off the compose setup and updated it to use an initcontainer,
same as with the recent overhaul of deploy logic (#2983). This change
also removes the requirement for the host machine to use `pd` to
bootstrap the config: now, docker-compose is all that's required.
The goal is to make the Penumbra containers easier to work with,
for example for the block explorer push.
conorsch added a commit that referenced this issue Oct 13, 2023
Dusted off the compose setup and updated it to use an initcontainer,
same as with the recent overhaul of deploy logic (#2983). This change
also removes the requirement for the host machine to use `pd` to
bootstrap the config: now, docker-compose is all that's required.
The goal is to make the Penumbra containers easier to work with,
for example for the block explorer push.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Testnet 61: Dione
Development

No branches or pull requests

1 participant