From 5f834dd371c2d42911bd7c2b41a0544112d50868 Mon Sep 17 00:00:00 2001 From: Chris O'Neil Date: Mon, 8 Jul 2024 23:25:57 +0100 Subject: [PATCH] Initial proposal for short-term release process --- .../0001-release-process-short-term.md | 325 ++++++++++++++++++ 1 file changed, 325 insertions(+) create mode 100644 0001-short-term-release-process/0001-release-process-short-term.md diff --git a/0001-short-term-release-process/0001-release-process-short-term.md b/0001-short-term-release-process/0001-release-process-short-term.md new file mode 100644 index 0000000..9aff28b --- /dev/null +++ b/0001-short-term-release-process/0001-release-process-short-term.md @@ -0,0 +1,325 @@ +# Short-Term Release Process for Autonomi + +- Status: proposed +- Type: enhancement +- Related components: +- Start Date: 2024-07-08 +- Discussion: +- Supersedes: +- Superseded by: + +## Summary + +This document intends to provide a detailed development and release process we can follow until the +initial launch of the network in October/November. It will be based on short release cycles that +will last about three weeks. After launch, the cycles would likely be at least double that length +and could be quite different in detail. + +It should be clear that we deal here with processes for releasing code, but we don't go into any +detail on how to deploy it. There are issues there we will also have to address, but those will be +for another document. + +## Conventions + +Uncertainties are indicated using a `*` notation. Rather than have a full `Unresolved Questions` +section at the end, where applicable there is an `Uncertainties` list in each section. + +The use of `**` is to denote an idea that came from Shu's diagram/proposal. + +## Motivations + +So far, we have failed to arrive at a well-defined process for releasing our code. With launch +imminent, it is my opinion that we've now run out of road for experimentation and we need to settle +on an unambiguous process that we can execute without deliberation. Finalising this document should +be a collaborative process in which we address any uncertainties rather than leaving them open for +further experiment. + +## Detailed Design + +## Beta Release Cycles and Schedule + +This section provides guidelines for possible dates for each cycle until launch. + +### Wave 1 + +Release and deployment for next wave: +* Stable release and production deploy: 2024-07-08 + +### Wave 2 + +Live for users: 2024-07-09 / 825 people / 11 week comp / comp ends: 2024-09-27 + +Three week development/release cycle: +* Development begins 2024-07-08 +* Release candidate branch cut: 2024-07-22 +* Stable release and production deploy for wave 3: 2024-07-31 + +### Wave 3 + +Live for users: 2024-08-01 / 1000 people / 8 week comp / comp ends: 2024-09-27 + +Three week development/release cycle: +* Development begins 2024-08-01 +* Release candidate branch cut: 2024-08-15 +* Stable release and production deploy for wave 4: 2024-08-22 + +### Wave 4 + +Live for users: 2024-08-23 / 1000 people / 5 week comp / comp ends: 2024-09-27 + +Three week development/release cycle: +* Development begins 2024-08-22 +* Release candidate branch cut: 2024-09-05 +* Stable release and production deploy for wave 5: 2024-09-12 + +### Wave 5 + +Live for users: 2024-09-13 / 1000 people / 3 week comp / comp ends: 2024-09-27 + +Three week development/release cycle: +* Development begins 2024-09-12 +* Release candidate branch cut: 2024-09-26 +* Stable release and production deploy for launch: 2024-10-03 + +This would perhaps take us to launch. At this point, we could consider setting our crate versions to +`1.0.0`. + +## General Branching/Merging Techniques + +The chosen branching and release model is closely correlated with Gitflow. Gitflow is a mature model +that has proven useful for projects that cannot operate using continuous delivery. [This](https://nvie.com/posts/a-successful-git-branching-model/) article originally defined the model. Reading it will help you familiarise yourself with the general concepts. + +In our setup, we would have two permanent branches, `main` and `stable`, where `main` is for +day-to-day development and `stable` is essentially for tracking releases we consider deployable. +Like Gitflow, we would also have some temporary branches; these will be discussed in more detail +later. + +### Merging + +Historically, our team has preferred `git rebase` to `git merge`; however, Gitflow is better +supported by using `git merge`. Rebasing is fine before submitting upstream, but Gitflow involves +merging commits between different branches. Due to the fact that rebasing rewrites the commit +history, merging between branches can lead to commits that have the same content but different +hashes. This can make merging more confusing than it needs to be and also cause the commit history +to be littered with duplicates. The primary option for completing a PR should be to merge it in +rather than rebase. + +## Release Cycle Overview + +This section provides an overview of a release cycle. Certain aspects here merit more detail, but +those will be provided in subsequent sections. + +The cycle has the following phases and steps: + +* Internal development and testing phase (two weeks): + - Begins immediately after production deploy of previous wave + - Deploy previous stable version to `STG-02` ** + - Feature branches are worked on and merged back into `main` + - Developers can deploy their own isolated testnets if necessary + - We can do quick `alpha` releases if necessary +* Release candidate (RC) phase (one week): + - Create a `release-YYYY.MM.rc.1` branch from `main`. No new features will be accepted on this + branch. + - Bump version numbers, with an `rc.1` suffix applied. + - Build RC and release to Github as a public `pre-release` but with no published crates. + - Deploy RC to `STG-01`. + - Production of changelog begins. + - Invite community members to test against this network. + - Shu's metrics solution enables comparisons to be made to the previous stable release in `STG-02`. + - Fixes can be applied to the release branch, resulting in an `rc.2`. This will be + released/deployed/tested. Repeat if necessary. +* Release and deploy phase (one/two days): + - On the release branch, modify version numbers to remove the `rc` pre-release specifier. + - Changelog finalised. + - Merge the release branch into `stable`. + - Merge the release branch into `main`. + - When ready, perform a release and tag of `stable` by kicking off a workflow. * + - Delete the release branch. + - Deploy to `PROD-01`/`02` and drain * + - Announce to users + +We can accommodate hotfixes at any point during the cycle. + +#### Uncertainties + +* Not completely sure about the duration of each phase. Any opinions? +* We could deploy automatically, but do we need to coordinate with an announcement? +* What does the 'draining' process with two production environments look like? Shu to elaborate on +the details? Do we need this on an environment that we upgrade? + +## Release Cycle Anatomy + +We'll now elaborate the release cycle described in the last section, discussing each type of release +in more detail. + +### Artifacts + +All release types will produce a set of binary artifacts, so we'll discuss these first. In Rust, +crates must use Semantic Versioning. A binary is defined within a crate, and therefore, by default, +it will also have a Semantic Version; however, it is possible to override the `--version` argument +on the binary to provide something custom. Our releases currently produce eight binary artifacts. * +It would be useful if we could refer to these collectively with a single version number and package, +where the package name would reflect the version number. ** The `--version` argument would identify +this version number, but also identify the individual component using its Semantic Version. * + +The proposal is to use `YYYY.MM.X.Y` as the collective version number. ** + +We would use the collective version number for a single Github Release. The assets for the release +would be the combined binary packages for each platform. The changelog can also be nicely applied to +this combined release. + +#### Uncertainties + +* Should `faucet` and `sn_auditor` be part of a package targeted at users? +* Should Semantic Versions be dropped from user-facing elements? +* What are the `X` and `Y` in the version number? + +### Alpha Releases + +An alpha release would accommodate a scenario in which we wanted to put out quick, experimental code +to have community users test something on an isolated network. We can branch it off `main` and +discard the branch when it's done. The branch is intended to have a very short duration. It will be +possible to apply fixes to it and do a new release on the same branch, in which case the pre-version +specifier will be incremented. If the fixes are good, we can cherry pick the fix commits back into +`main`. + +An owner should be designated to the whole experiment. They will produce the release, deploy it, and +coordinate with users. + +#### Process + +* Prepare a light description that communicates the purpose of the release; we do not need a full + changelog for this type of release. +* Create and checkout an `YYYY-MM-DD-alpha.1` branch from `main`. +* Use `release-plz update` to bump crate versions. +* Use a script to apply the `alpha` pre-release specifier to each bumped crate. +* Create a `chore(release): alpha-YYYY-MM-DD` commit. +* Push that branch to the upstream `maidsafe` repo, which will kick off the release workflow. +* The release workflow will produce a public `pre-release` on Github, but the crates will *not* be + published. +* Manually edit the Github Release to provide the description prepared in the first step. +* Use the `Launch Network` workflow to deploy the `alpha` binaries to an isolated network. * +* Announce the availability of the binaries to the community. Users can use `safenode-manager` + and/or `safeup` with `--version` arguments to obtain the alpha binaries. +* Perform testing +* If a problem is identified, there is an opportunity for a small fix/test cycle: + - On the same alpha branch, apply the fix. + - Use `cargo release version alpha --package X --package Y` to increment the necessary crates to + `alpha.2` + - Deploy the fix either using an upgrade or by launching a new testnet + - Users can test + - Repeat if necessary (in practice we should not have many of these) +* If need be, any fix commits applied on the alpha branch should be cherry picked into `main`. +* The experiment is over and the branch should be deleted. + +The branch is discarded because we don't want the alpha version bumps back in `main`. Crates were +also not published. The Github Release will function as the historical record of the existence of +the alpha release. + +#### Uncertainties + +* Would we need to use the `NETWORK_VERSION` compile-time variable for this? + +### Release Candidates + +A release candidate (RC) is the binary that's intended to be released as a stable version. The set +of features and fixes in the RC is what's included on `main` in the current cycle, i.e., between now +and the last stable release. After about two weeks of development we should produce the RC for +testing in the staging environment. + +An owner should be designated to the process, and it would be useful for this to cycle through +everyone in the team. Once the RC branch is started, we won't accept new features on it, only fixes. +Feature development can continue on `main`. + +#### Process + +* Create and checkout an `YYYY-MM-DD-rc.1` branch from `main`. +* Use `release-plz update` to bump version numbers. These new versions should be the ones used for + the stable release that will be based on this RC branch. +* Use a custom script to apply `rc.1` to the new versions. We can't use `cargo release` for this + because it also performs a `PATCH` bump when you apply the pre-release specifier, which is very + annoying. +* Create a new template entry in the changelog. This can be filled out as an on-going process + between now and the stable release. +* Create a `chore(release): YYYY-MM-DD-rc.1` with the version bump and changelog and push the branch + to `origin` or `upstream`. +* That push should trigger a workflow that will: + - Build the RC + - Upload the binaries to S3 + - Produce a public `pre-release` Github Release + - Crates will NOT be published +* Use a script to produce a list of the commits between now and the last stable version. The commits + will be grouped by author. This list can be posted in Slack or Discourse to aid developers in + supplying their contributions for the changelog. +* Use an `Upgrade Network` workflow to deploy the RC to `STG-01`. This will function as a test of + the upgrade process and help identify breaking changes we may have missed. +* Invite users to participate in testing the RC. They can use `safenode-manager` to obtain the `rc` + nodes and `safeup` for `rc` clients. +* We can also perform our own QA testing, some of which will come from the metrics that result from + the comparison to the previous stable release. +* If fixes are necessary, they should be applied to this branch. We then bump to `rc.2` and do + another release, which again should be deployed to `STG-01`. Users can get the new binaries. This + part could potentially be repeated, but obviously we want to avoid that. + +This whole process will likely last about a week. We would now be in a position where we'd be +looking to do a stable release and deploy to production. + +### Stable Release + +The owner of the RC process can probably carry over to this one, though it would be possible for +someone else to be assigned. At this point, the release branch still exists. This process is about +making a stable release from that branch. + +#### Process + +* Still on the release branch, use a script to remove the `rc` pre-release identifier from the + crates that were bumped. +* If it isn't already, the changelog should now be finalised. +* Create a `chore(release): YYYY.MM.X.Y` commit. Put the crate name and version bumps in the body of + the commit. Any final additions to the changelog can be part of this commit. +* Create a PR for merging the release branch into `stable`. +* Once it's been merged to stable, also merge the release branch into `main`. +* When ready, kick off a workflow for the stable release*. The workflow will: + - Build the binaries + - Upload them to S3 + - Public Github Release + - Publish crates + - Tag based on combined version +* Manually edit the Github Release to apply the latest changelog entry to the description* +* Delete the release branch + +We would now be in a position to deploy the stable release to the `PROD-01` (and possibly `PROD-02`) +environment. The production deployment would be covered in another RFC. + +#### Uncertainties + +* Obviously this could be done automatically on push, but we may want to coordinate with an + announcement? +* It's probably possible to automate this, but it's a very low-effort manual step. + +## Hotfixes + +Hotfixes are intended to quickly fix a severe bug in a stable release. They can occur at any time +throughout the release cycle, although they'd probably more likely be near the beginning. + +### Process + +* Create and checkout a `hotfix-YYYY.MM.DD` branch from `stable` and push it to `origin` +* Create an entry in the changelog that describes the fix +* Use `release-plz update` to get new version numbers for the crates that the fix applies to +* Create a `chore(release): hotfix-YYYY.MM.DD` commit with the bumped crates and versions in the + body of the commit. +* Fetch this branch from `upstream` to a fork and apply the fix. +* PR the commit with the fix to the `upstream` branch. This will enable someone to review it. +* If changes are requested, keep going until those are resolved. +* Use a workflow to deploy +* Deploy the fix to some kind of staging environment to be tested.* +* When the fix is confirmed to be working, create a PR to merge the branch back into `stable`. +* Also merge it back into `main`. +* Perform a stable release at the new version number.* +* If it's a change to the node, deploy it to production using an upgrade process. + +#### Uncertainties + +* Should this replace what's on `STG-01`? Or would it be another isolated staging environment? +* Could be automated or manually triggered.