-
Notifications
You must be signed in to change notification settings - Fork 39
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #1086 from input-output-hk/jpraynaud/add-productio…
…n-runbook Add network production runbooks for Aggregator
- Loading branch information
Showing
9 changed files
with
350 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
# Mithril network runbook :shield: | ||
|
||
This page gathers the available guides to operate a Mithril network. | ||
|
||
:fire: This guides are intended to be used by expert users, and could lead to irreversible damages or loss for a network. | ||
|
||
# Guides | ||
|
||
| Operation | Location | Description | ||
|------------|------------|------------ | ||
| **Genesis manually** | [manual-genesis](./genesis-manually/README.md) | Proceed to manual (re)genesis of the aggregator certificate chain. | ||
| **Era markers** | [era-markers](./era-markers/README.md) | Create and update era markers on the Cardano chain. | ||
| **Signer registrations monitoring** | [registrations-monitoring](./registrations-monitoring/README.md) | Gather aggregated data about signer registrations (versions, stake, ...). | ||
| **Update protocol parameters** | [protocol-parameters](./protocol-parameters/README.md) | Update the protocol parameters of a Mithril network. | ||
| **Recompute certificates hash** | [recompute-certificates-hash](./recompute-certificates-hash/README.md) | Recompute the certificates has of an aggregator. | ||
| **Fix terraform lock** | [terraform-lock](./terraform-lock/README.md) | Fix a terraform lock in CD workflows. | ||
| **Manage SSH access to infrastructure** | [ssh-access](./ssh-access/README.md) | Manage SSH access on the VM of the infrastructure for a user. | ||
|
||
|
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,91 @@ | ||
# Manual genesis of production Mithril network | ||
|
||
## Configure environment variables | ||
Export the environment variables: | ||
```bash | ||
export MITHRIL_VM=**MITHRIL_VM** | ||
export CARDANO_NETWORK=**CARDANO_NETWORK** | ||
``` | ||
|
||
Here is an example for the `release-mainnet` network: | ||
```bash | ||
export MITHRIL_VM=aggregator.release-mainnet.api.mithril.network | ||
export CARDANO_NETWORK=mainnet | ||
``` | ||
|
||
## Export the genesis payload to sign | ||
|
||
Connect to the aggregator VM: | ||
```bash | ||
ssh curry@$MITHRIL_VM | ||
``` | ||
|
||
Once connected to the aggregator VM, export the environment variables: | ||
```bash | ||
export CARDANO_NETWORK=**CARDANO_NETWORK** | ||
``` | ||
|
||
And create genesis dir: | ||
```bash | ||
mkdir -p /home/curry/data/$CARDANO_NETWORK/mithril-aggregator/mithril/genesis | ||
``` | ||
And connect to the aggregator container: | ||
```bash | ||
docker exec -it mithril-aggregator bash | ||
``` | ||
|
||
Once connected to the aggregator container, export the genesis payload to sign: | ||
```bash | ||
/app/bin/mithril-aggregator -vvv genesis export --target-path /mithril-aggregator/mithril/genesis/genesis-payload-to-sign.txt | ||
``` | ||
|
||
Then disconnect from the aggregator container: | ||
```bash | ||
exit | ||
``` | ||
|
||
Then disconnect from the aggregator VM: | ||
```bash | ||
exit | ||
``` | ||
|
||
## Sign the genesis payload | ||
|
||
Once on your local machine, copy the genesis payload to sign from the aggregator VM: | ||
```bash | ||
scp curry@$MITHRIL_VM:/home/curry/data/$CARDANO_NETWORK/mithril-aggregator/mithril/genesis/genesis-payload-to-sign.txt . | ||
``` | ||
|
||
Download or build the aggregator on your local machine as explained in this [documentation](https://mithril.network/doc/manual/developer-docs/nodes/mithril-aggregator#download-source) | ||
|
||
Then, sign the payload with the genesis secret key: | ||
```bash | ||
./mithril-aggregator -vvv genesis sign --to-sign-payload-path genesis-payload-to-sign.txt --target-signed-payload-path genesis-payload-signed.txt --genesis-secret-key-path genesis.sk | ||
``` | ||
|
||
## Import the signed genesis payload | ||
|
||
Then, copy the signed genesis payload back to the aggregator VM: | ||
```bash | ||
scp ./genesis-payload-signed.txt curry@$MITHRIL_VM:/home/curry/data/$CARDANO_NETWORK/mithril-aggregator/mithril/genesis/genesis-payload-signed.txt | ||
``` | ||
|
||
Then, connect back to the aggregator VM: | ||
```bash | ||
ssh curry@$MITHRIL_VM | ||
``` | ||
|
||
Export the environment variable: | ||
```bash | ||
export CARDANO_NETWORK=**CARDANO_NETWORK** | ||
``` | ||
|
||
And connect back to the aggregator container: | ||
```bash | ||
docker exec -it mithril-aggregator bash | ||
``` | ||
|
||
Once connected to the aggregator container, import the signed genesis payload: | ||
```bash | ||
/app/bin/mithril-aggregator -vvv genesis import --signed-payload-path /mithril-aggregator/mithril/genesis/genesis-payload-signed.txt | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,71 @@ | ||
# Update the protocol parameters of a Mithril network | ||
|
||
## Introduction | ||
|
||
The protocol parameters of a network are currently defined when starting the aggregator of the network. | ||
During startup, the aggregator will store the parameters in its stores, and will use them **3** epochs later. The protocol parameters are broadcasted by the aggregator to the signers of the network through the `/epoch-settings` route. | ||
|
||
## Update parameters of a Mithril network | ||
The aggregator has the following configuration parameter used to set the protocol parameters: `protocol_parameters` which is a JSON representation of the `ProtocolParameter` type: | ||
```bash | ||
pub struct ProtocolParameters { | ||
/// Quorum parameter | ||
pub k: u64, | ||
|
||
/// Security parameter (number of lotteries) | ||
pub m: u64, | ||
|
||
/// f in phi(w) = 1 - (1 - f)^w, where w is the stake of a participant | ||
pub phi_f: f64, | ||
} | ||
``` | ||
|
||
Each parameter can also be set via an environment variable: | ||
- `PROTOCOL_PARAMETERS__K` for `k` | ||
- `PROTOCOL_PARAMETERS__M` for `m` | ||
- `PROTOCOL_PARAMETERS__PHI_F` for `phi-f` | ||
|
||
When setting up a Mithril network with a `terraform` deployment, the protocol parameters are set with a JSON definition. | ||
|
||
## Find the workflow used to deploy a Mithril network | ||
|
||
Currently, the following [Mithril networks](https://mithril.network/doc/manual/developer-docs/references#mithril-networks) are generally available, and deployed with `terraform`: | ||
- `testing-preview`: with the workflow [`.github/workflows/ci.yml`](../../github/workflows/ci.yml) | ||
- `pre-release-preview`: with the workflow [`.github/workflows/pre-release.yml`](../../github/workflows/pre-release.yml) | ||
- `release-preprod`: with the workflow [`.github/workflows/release.yml`](../../github/workflows/release.yml) | ||
- `release-mainnet`: with the workflow [`.github/workflows/release.yml`](../../github/workflows/release.yml) | ||
|
||
## Update the protocol parameters | ||
|
||
Update the following value of the targeted network in the deployment matrix with the new values that need to be used: | ||
```bash | ||
mithril_protocol_parameters: | | ||
{ | ||
k = 5 | ||
m = 100 | ||
phi_f = 0.6 | ||
} | ||
``` | ||
|
||
Which will be replaced eg with: | ||
```bash | ||
mithril_protocol_parameters: | | ||
{ | ||
k = 2422 | ||
m = 20973 | ||
phi_f = 0.2 | ||
} | ||
``` | ||
|
||
The modifications should be created in a dedicated PR, and the result of the **Plan** job of the terraform deployment should be analyzed precisely to make sure that the change has been taken into consideration. | ||
|
||
## Deployment of the new protocol parameters | ||
|
||
The update of the new protocol parameters will take place as detailed in the following table: | ||
| Workflow | Deployed at | Effective at | ||
|------------|------------|------------ | ||
| [`.github/workflows/ci.yml`](../../github/workflows/ci.yml) | Merge on `main` branch | **3** epochs later | ||
| [`.github/workflows/pre-release.yml`](../../github/workflows/pre-release.yml) | Pre-release of a distribution | **3** epochs later | ||
| [`.github/workflows/release.yml`](../../github/workflows/release.yml) | Release of a distribution | **3** epochs later | ||
|
||
For more information about the CD, please refer to [Release process and versioning](https://mithril.network/doc/adr/3). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,92 @@ | ||
# Recompute the certificates hashes of Mithril aggregator | ||
|
||
## Configure environment variables | ||
Export the environment variables: | ||
```bash | ||
export MITHRIL_VM=**MITHRIL_VM** | ||
export CARDANO_NETWORK=**CARDANO_NETWORK** | ||
``` | ||
|
||
Here is an exmaple for the `release-mainnet` network: | ||
```bash | ||
export MITHRIL_VM=aggregator.release-mainnet.api.mithril.network | ||
export CARDANO_NETWORK=mainnet | ||
``` | ||
|
||
## Make a backup of the aggregator database | ||
|
||
Connect to the aggregator VM: | ||
```bash | ||
ssh curry@$MITHRIL_VM | ||
``` | ||
|
||
Once connected to the aggregator VM, export the environment variables: | ||
```bash | ||
export CARDANO_NETWORK=**CARDANO_NETWORK** | ||
``` | ||
|
||
And copy the SQLite database file `aggregator.sqlite3`: | ||
```bash | ||
cp /home/curry/data/$CARDANO_NETWORK/mithril-aggregator/mithril/stores/aggregator.sqlite3 cp /home/curry/data/$CARDANO_NETWORK/mithril-aggregator/mithril/stores/aggregator.sqlite3.bak.$(date +%Y-%m-%d) | ||
``` | ||
|
||
And connect to the aggregator container: | ||
```bash | ||
docker exec -it mithril-aggregator bash | ||
``` | ||
|
||
Once connected to the aggregator container, recompute the certificates hashes: | ||
```bash | ||
/app/bin/mithril-aggregator -vvv tools recompute-certificates-hash | ||
``` | ||
|
||
Then disconnect from the aggregator container: | ||
```bash | ||
exit | ||
``` | ||
|
||
## Restart the aggregator | ||
|
||
Restart the aggregator to make sure that the certificate chain is valid: | ||
```bash | ||
docker restart mithril-aggregator | ||
``` | ||
|
||
Make sure that the certificate chain is valid (wait for the state machiene to go into the state `READY`): | ||
```bash | ||
docker logs -f --tail 1000 mithril-aggregator | ||
``` | ||
|
||
Then disconnect from the aggregator VM: | ||
```bash | ||
exit | ||
``` | ||
|
||
## Rollback procedure | ||
|
||
If the recomputation fails, you can rollback the database. | ||
|
||
First, stop the aggregator: | ||
```bash | ||
docker stop mithril-aggregator | ||
``` | ||
|
||
Then, restore the backed up database: | ||
```bash | ||
cp /home/curry/data/$CARDANO_NETWORK/mithril-aggregator/mithril/stores/aggregator.sqlite3.sqlite3.bak.$(date +%Y-%m-%d) cp /home/curry/data/$CARDANO_NETWORK/mithril-aggregator/mithril/stores/aggregator | ||
``` | ||
|
||
Then, start the aggregator: | ||
```bash | ||
docker start mithril-aggregator | ||
``` | ||
|
||
Make sure that the certificate chain is valid (wait for the state machiene to go into the state `READY`): | ||
```bash | ||
docker logs -f --tail 1000 mithril-aggregator | ||
``` | ||
|
||
Then disconnect from the aggregator VM: | ||
```bash | ||
exit | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
# Manage SSH access to infrastructure | ||
|
||
## Add access to a user | ||
|
||
### Create a SSH keypair for a user (if needed) | ||
|
||
Create a new SSH keypair, with `ed25519` cryptography for maximum security: | ||
```bash | ||
ssh-keygen -t ed25519 -C "[email protected]" | ||
``` | ||
|
||
Then, add your keypair to the ssh-agent: | ||
```bash | ||
ssh-add ~/.ssh/id_ed25519 | ||
``` | ||
|
||
### Retrieve the public key of your SSH keypair | ||
|
||
Run the following command to retrieve your public key: | ||
```bash | ||
cat ~/.ssh/id_ed25519.pub | ||
``` | ||
|
||
### Declare the public key | ||
|
||
Add a line with the format `**REMOTE_USER**:*PUBLIC_KEY**` in the `mithril-infra/assets/ssh_keys` file for each: | ||
```bash | ||
echo "curry:ssh-ed25519 AAAE53AC3NzQ2vlZDI1aC1O4CpX+S2y1X9NTB4rv4k3pAAAAIF3b7L9sPV5ZiGgogmko [email protected]" >> **REPOSITORY_PATH**/mithril-infra/assets/ssh_keys | ||
``` | ||
|
||
Then, create a PR with the updated `ssh_keys` file. | ||
|
||
## Remove access to a user | ||
|
||
To remove an access, simply remove the line(s) related to this user. | ||
|
||
Then, create a PR with the updated `ssh_keys` file. | ||
|
||
## When are the modifications applied? | ||
|
||
The modifications will be applied the next time the terraform deployment is done: | ||
- next **merge** in `main` branch for `testing-preview` | ||
- next **pre-release** created for `pre-release-preview` | ||
- next **release** created for `release-preprod` | ||
- next **release** created for `release-mainnet` | ||
|
||
When the modifications are applied, the VM is updated in place by terraform. | ||
|
||
:warning: In case of emergency, the SSH keys can be modified by an administrator: | ||
- In GCP [**Compute Engine**](https://console.cloud.google.com/compute/instances) | ||
- The SSH keys can be edited in the targeted VM(s) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
# Fix terraform deployment lock | ||
|
||
## Introduction | ||
|
||
When the CI cancels a job that is in the middle of a terraform deployment, there is a chance that the lock file used by terraform under the hood to avoid concurrent deployment is not removed. In that cas, the next time a CI job tries to deploy, it will receive an error stating that there is a lock that prevents the deployment to be operated. | ||
|
||
## Find the workflow used to deploy a Mithril network | ||
|
||
Currently, the following [Mithril networks](https://mithril.network/doc/manual/developer-docs/references#mithril-networks) are generally available, and deployed with `terraform`: | ||
- `testing-preview`: with the workflow [`.github/workflows/ci.yml`](../../github/workflows/ci.yml) | ||
- `pre-release-preview`: with the workflow [`.github/workflows/pre-release.yml`](../../github/workflows/pre-release.yml) | ||
- `release-preprod`: with the workflow [`.github/workflows/release.yml`](../../github/workflows/release.yml) | ||
- `release-mainnet`: with the workflow [`.github/workflows/release.yml`](../../github/workflows/release.yml) | ||
|
||
|
||
## Identify the terraform backend bucket | ||
In the workflow file, there is a `terraform_backend_bucket` that details the GCP bucket that is used by terraform to store the state of the deployment. | ||
|
||
## Reset the terraform lock | ||
|
||
A user with administrator rights can simply remove the lock file: | ||
- In GCP [**Cloud Storage**](https://console.cloud.google.com/storage/browser) | ||
- In the terraform administration bucket that you have identified earlier, the file that needs to be removed is at path `**TERRAFORM_BACKEND_BUCKET**/terraform/mithril-**MITHRIL_NETWORK_IDENTIFIER**/.terraform.lock.hcl` (e.g. `mithril-terraform-prod/terraform/mithril-release-mainnet/terraform.lock.hcl`) | ||
|
||
:warning: never delete/modify the `**TERRAFORM_BACKEND_BUCKET**/terraform/mithril-**MITHRIL_NETWORK_IDENTIFIER**/default.tfstate` file. |