Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade checklist updates #967

Merged
merged 4 commits into from
Jan 6, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 24 additions & 18 deletions .github/ISSUE_TEMPLATE/server-migration.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ assignees: ''

Checklist based on general guide https://github.com/openfoodfoundation/ofn-install/wiki/Migrating-a-Production-Server

Tip: find/replace to set up most commands ready to go, eg: `x_prod` -> `ca_prod`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should use easy to replace example names like old_server and new_server which could be prod or staging. Some of the guide just refers to x instead of x_prod as well. It would be good to make this consistent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. Although in some places it's actually referring to the domain name (eg host_vars/x/config.yml) so that could have its own keyword.


## 1. Setting up the new server
- [ ] Check old server config for any additional services to be aware of. Document any necessary steps for migration. Eg:
- `ls /etc/nginx/sites-enabled`
Expand All @@ -17,9 +19,10 @@ Checklist based on general guide https://github.com/openfoodfoundation/ofn-insta
- [ ] DNS: add temporary domain (eg `prod2.openfoodnetwork.org`)

### config
- [ ] Add temporary name to `inventory/hosts`
- [ ] Add temporary name to `inventory/hosts` (suggest doing this on separate branch)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering about this temporary host name. I don't think it's temporary. Don't we want to be able to identify a unique host like prod4.openfoodnetwork.org.au which is providing the site openfoodnetwork.org.au?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I think the unique hostname is very helpful. I'm not sure, but we should be able to use that in the hosts definition, rather than the primary domain.

This is distinct from the app domain which is set in host_vars (domain and certbot_domains). The guide suggested also using the temporary domain there but I found it caused more problems than it was worth.

I feel like this list needs rewriting again but I won't do that while it's not fresh in my mind.

- [ ] Review `host_vars/x/config.yml`, clean up if needed
- Make a copy for the temp hostname, add temp domain to bottom of `certbot_domains`
- [ ] Review `group_vars/x.yml`, clean up if needed
- [ ] Review `ofn-secrets:x_prod/secrets.yml`, clean up if needed
- Change to shared bugsnag projects
- Don't bother making a copy of this one
Expand All @@ -41,7 +44,7 @@ Then setup new server. Ensure you have the correct secrets (current secrets are
- [ ] Setup direct ssh access for `ofn-admin` and `openfoodnetwork` as per guide
dacook marked this conversation as resolved.
Show resolved Hide resolved

`ansible-playbook -l x_prod -e rsync_to=x_prod2 playbooks/`
- [ ] `db_transfer.yml`
- [ ] `db_transfer.yml` &&
- [ ] `transfer_assets.yml`

Make sure to clear cache so that instance settings are applied:
Expand All @@ -60,44 +63,47 @@ Make sure to clear cache so that instance settings are applied:

## 3. Migration
### preparation
- [ ] **new server**: `bin/rake db:reset -e production` (important: make sure you're on the new server!)
- [ ] `deploy.yml -l x_prod2 -e "git_version=vX.Y.Z"` matching version with current prod
- [ ] Reset database on new server, to avoid any migration issues due to being out of sync
`bin/rake db:reset` (You will need to confirm. Make sure you're on the new server!)
- [ ] Update ansible_host IP in `inventory/hosts` and ensure provision works (this will update host in `.env.production`).
`ansible-playbook playbooks/provision.yml -l x_prod`
- [ ] `ansible-playbook playbooks/deploy.yml -l x_prod -e "git_version=vX.Y.Z"` matching version with current prod
- [ ] old server: make a tiny data change to verify later (eg add `.` in meta description `/admin/general_settings/edit`)

### switchover: old server
- [ ] 🚧 `maintenance_mode.yml`
- [ ] 🚧 `ansible-playbook playbooks/maintenance_mode.yml -l x_prod`
- [ ] `sudo systemctl stop sidekiq redis-jobs puma`
dacook marked this conversation as resolved.
Show resolved Hide resolved
- [ ] `ansible-playbook -l x_prod -e rsync_to=x_prod2 playbooks/db_transfer.yml &&`
- [ ] `ansible-playbook -l x_prod -e rsync_to=x_prod2 playbooks/transfer_assets.yml`
- [ ] Transfer `/var/lib/redis-jobs/dump.rdb` to new server (see guide)
- [ ] `db_transfer.yml` ~3min
- [ ] `sudo systemctl stop postgres` (ensure other integrations no longer touch it)
- [ ] `transfer_assets.yml` just in case
- [ ] `sudo systemctl stop postgresql` (ensure other integrations no longer touch it)
dacook marked this conversation as resolved.
Show resolved Hide resolved

### switchover: new server
- [ ] `sudo systemctl restart puma; sudo systemctl start sidekiq redis-jobs`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sidekiq is disabled at this point. We have to enable it first before we can start it. But I personally would probably do it staggered:

  • Start puma and check with hosts file.
  • Start sidekiq and check log file.
  • Then install proxy forwarding.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much time did you need for testing? If it's short, maybe we could do it during maintenance mode which would enable us to then switch over straight away. It would save us all the work of transferring data twice and resetting in between.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that would be much quicker, I just found that it was never straightforward enough to be confident that everything would work first go, so I preferred a more conservative approach.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I personally would probably do it staggered:

That's a good idea. I did those checks in the 'testing' stage, so skipped it here. But you could totally do it that way too.

- [ ] `Rails.cache.clear` (or migrate redis-cache/dump.rdb also)
- [ ] ⏭️ `temporary_proxy.yml -e 'proxy_target=<ip>'` redirect traffic to new prod
- [ ] `cd ~/apps/openfoodnetwork/current; bin/rails runner -e production "Rails.cache.clear"` (or migrate redis-cache/dump.rdb also)
- [ ] ⏭️ `ansible-playbook -l x_prod playbooks/temporary_proxy.yml -e 'proxy_target=<new_ip>'` redirect traffic to new prod
* Note: this doesn't include webservices, and doesn't handle images. So it's a very short-term fix if at all.
* Use a `hosts` file entry to test a direct connection
- Check there are no alarm bells, eg:
- [ ] `~/apps/openfoodnetwork/current/logs/production.log` and `sidekiq.log`
- [ ] tiny data change is present. undo it.
- [ ] shopfront and checkout looks good
- [ ] upload a product image
- [ ] get confirmation from local team
- [ ] `~/apps/openfoodnetwork/current/log/production.log` and `sidekiq.log`
- [ ] Update DNS to point to new server
- [ ] get confirmation from local team
- [ ] make sure the entries in ofn-install are up to date: set the new IP address and remove any temporary entry made for the migration
- Update documentation:
* [ ] https://github.com/openfoodfoundation/ofn-install/wiki/Current-servers
* [ ] This migration guide if necessary

## 4. Cleanup (after 48hrs)
- [ ] check server access logs to verify no traffic
- [ ] shut down the old server, cancel old VPS
- [ ] shut down the old VPS
- [ ] delete old VPS (or rename for future deletion)
- [ ] remove DNS for temporary subdomain
- [ ] make sure the entries in ofn-install are up to date: remove the temporary entry made for the migration, and set the new IP address.
- [ ] validate that `provision.yml` still works. This will rename x-prod2 to x-prod
- [ ] check metabase sync if required: https://data.openfoodnetwork.org/admin/databases/
- [ ] check n8n
- [ ] check backups are functioning
- Update documentation:
* [ ] https://github.com/openfoodfoundation/ofn-install/wiki/Current-servers
* [ ] This migration guide if necessary


## Rollback plan
Expand Down
9 changes: 6 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,15 +59,18 @@ $ bin/setup

## Secrets

Some tasks require host-specific secrets, and will show an error if they haven't been provided. These can change from time to time, so **always ensure you have the latest before provisioning**.
Some tasks (eg provision.yml) require host-specific secrets, and will show an error if they haven't been provided. Secrets can be added to the relevant `host_vars` subfolder (see [wiki](https://github.com/openfoodfoundation/ofn-install/wiki/Configuration#add-host_vars)) where they will be loaded automatically.

Secrets can be provided with a parameter like so:
Secrets can also be loaded from a different folder with a parameter like so:

```sh
ansible-playbook playbooks/provision.yml --limit=au_staging -e "@../ofn-secrets/au_staging/secrets.yml" --ask-vault-pass
```

If you have access to the `ofn-secrets` repository, you can fetch them with the `fetch_secrets.yml` playbook. The secrets for each host will be loaded into the relevant directory in `inventory/host_vars/`, then you can go ahead and provision. See the [readme](https://github.com/openfoodfoundation/ofn-secrets/#readme) for more tips on setup.
Many servers are managed by the OFN core team, so we have a copy of secrets in a shared repository which is considered the source of truth. Once your server is managed by the core team, ensure any config changes are sent to them.

Core team members can fetch the latest with the `fetch_secrets.yml` playbook. The secrets for each host will be loaded into the relevant directory in `inventory/host_vars/`, then you can go ahead and provision. See the [ofn-secrets readme](https://github.com/openfoodfoundation/ofn-secrets/#readme) for more tips on setup.
These can change from time to time, so **always ensure you have the latest before provisioning**.

```sh
ansible-playbook playbooks/fetch_secrets.yml && ansible-playbook playbooks/provision.yml
Expand Down
Loading