diff --git a/docs/runbook/README.md b/docs/runbook/README.md new file mode 100644 index 00000000000..c4da31be7c1 --- /dev/null +++ b/docs/runbook/README.md @@ -0,0 +1,19 @@ +# Mithril network runbook :shield: + +This page gathers the available guides to operate a Mithril network. + +:fire: This guides are intended to be used by expert users, and could lead to irreversible damages or loss for a network. + +# Guides + +| Operation | Location | Description +|------------|------------|------------ +| **Genesis manually** | [manual-genesis](./genesis-manually/README.md) | Proceed to manual (re)genesis of the aggregator certificate chain. +| **Era markers** | [era-markers](./era-markers/README.md) | Create and update era markers on the Cardano chain. +| **Signer registrations monitoring** | [registrations-monitoring](./registrations-monitoring/README.md) | Gather aggregated data about signer registrations (versions, stake, ...). +| **Update protocol parameters** | [protocol-parameters](./protocol-parameters/README.md) | Update the protocol parameters of a Mithril network. +| **Recompute certificates hash** | [recompute-certificates-hash](./recompute-certificates-hash/README.md) | Recompute the certificates has of an aggregator. +| **Fix terraform lock** | [terraform-lock](./terraform-lock/README.md) | Fix a terraform lock in CD workflows. +| **Manage SSH access to infrastructure** | [ssh-access](./ssh-access/README.md) | Manage SSH access on the VM of the infrastructure for a user. + + diff --git a/mithril-aggregator/utils/era/README.md b/docs/runbook/era-markers/README.md similarity index 100% rename from mithril-aggregator/utils/era/README.md rename to docs/runbook/era-markers/README.md diff --git a/docs/runbook/genesis-manually/README.md b/docs/runbook/genesis-manually/README.md new file mode 100644 index 00000000000..c203c5cec81 --- /dev/null +++ b/docs/runbook/genesis-manually/README.md @@ -0,0 +1,91 @@ +# Manual genesis of production Mithril network + +## Configure environment variables +Export the environment variables: +```bash +export MITHRIL_VM=**MITHRIL_VM** +export CARDANO_NETWORK=**CARDANO_NETWORK** +``` + +Here is an example for the `release-mainnet` network: +```bash +export MITHRIL_VM=aggregator.release-mainnet.api.mithril.network +export CARDANO_NETWORK=mainnet +``` + +## Export the genesis payload to sign + +Connect to the aggregator VM: +```bash +ssh curry@$MITHRIL_VM +``` + +Once connected to the aggregator VM, export the environment variables: +```bash +export CARDANO_NETWORK=**CARDANO_NETWORK** +``` + +And create genesis dir: +```bash +mkdir -p /home/curry/data/$CARDANO_NETWORK/mithril-aggregator/mithril/genesis +``` +And connect to the aggregator container: +```bash +docker exec -it mithril-aggregator bash +``` + +Once connected to the aggregator container, export the genesis payload to sign: +```bash +/app/bin/mithril-aggregator -vvv genesis export --target-path /mithril-aggregator/mithril/genesis/genesis-payload-to-sign.txt +``` + +Then disconnect from the aggregator container: +```bash +exit +``` + +Then disconnect from the aggregator VM: +```bash +exit +``` + +## Sign the genesis payload + +Once on your local machine, copy the genesis payload to sign from the aggregator VM: +```bash +scp curry@$MITHRIL_VM:/home/curry/data/$CARDANO_NETWORK/mithril-aggregator/mithril/genesis/genesis-payload-to-sign.txt . +``` + +Download or build the aggregator on your local machine as explained in this [documentation](https://mithril.network/doc/manual/developer-docs/nodes/mithril-aggregator#download-source) + +Then, sign the payload with the genesis secret key: +```bash +./mithril-aggregator -vvv genesis sign --to-sign-payload-path genesis-payload-to-sign.txt --target-signed-payload-path genesis-payload-signed.txt --genesis-secret-key-path genesis.sk +``` + +## Import the signed genesis payload + +Then, copy the signed genesis payload back to the aggregator VM: +```bash +scp ./genesis-payload-signed.txt curry@$MITHRIL_VM:/home/curry/data/$CARDANO_NETWORK/mithril-aggregator/mithril/genesis/genesis-payload-signed.txt +``` + +Then, connect back to the aggregator VM: +```bash +ssh curry@$MITHRIL_VM +``` + +Export the environment variable: +```bash +export CARDANO_NETWORK=**CARDANO_NETWORK** +``` + +And connect back to the aggregator container: +```bash +docker exec -it mithril-aggregator bash +``` + +Once connected to the aggregator container, import the signed genesis payload: +```bash +/app/bin/mithril-aggregator -vvv genesis import --signed-payload-path /mithril-aggregator/mithril/genesis/genesis-payload-signed.txt +``` diff --git a/docs/runbook/protocol-parameters/README.md b/docs/runbook/protocol-parameters/README.md new file mode 100644 index 00000000000..f080f2e6afd --- /dev/null +++ b/docs/runbook/protocol-parameters/README.md @@ -0,0 +1,71 @@ +# Update the protocol parameters of a Mithril network + +## Introduction + +The protocol parameters of a network are currently defined when starting the aggregator of the network. +During startup, the aggregator will store the parameters in its stores, and will use them **3** epochs later. The protocol parameters are broadcasted by the aggregator to the signers of the network through the `/epoch-settings` route. + +## Update parameters of a Mithril network +The aggregator has the following configuration parameter used to set the protocol parameters: `protocol_parameters` which is a JSON representation of the `ProtocolParameter` type: +```bash +pub struct ProtocolParameters { + /// Quorum parameter + pub k: u64, + + /// Security parameter (number of lotteries) + pub m: u64, + + /// f in phi(w) = 1 - (1 - f)^w, where w is the stake of a participant + pub phi_f: f64, +} +``` + +Each parameter can also be set via an environment variable: +- `PROTOCOL_PARAMETERS__K` for `k` +- `PROTOCOL_PARAMETERS__M` for `m` +- `PROTOCOL_PARAMETERS__PHI_F` for `phi-f` + +When setting up a Mithril network with a `terraform` deployment, the protocol parameters are set with a JSON definition. + +## Find the workflow used to deploy a Mithril network + +Currently, the following [Mithril networks](https://mithril.network/doc/manual/developer-docs/references#mithril-networks) are generally available, and deployed with `terraform`: +- `testing-preview`: with the workflow [`.github/workflows/ci.yml`](../../github/workflows/ci.yml) +- `pre-release-preview`: with the workflow [`.github/workflows/pre-release.yml`](../../github/workflows/pre-release.yml) +- `release-preprod`: with the workflow [`.github/workflows/release.yml`](../../github/workflows/release.yml) +- `release-mainnet`: with the workflow [`.github/workflows/release.yml`](../../github/workflows/release.yml) + +## Update the protocol parameters + +Update the following value of the targeted network in the deployment matrix with the new values that need to be used: +```bash +mithril_protocol_parameters: | + { + k = 5 + m = 100 + phi_f = 0.6 + } +``` + +Which will be replaced eg with: +```bash +mithril_protocol_parameters: | + { + k = 2422 + m = 20973 + phi_f = 0.2 + } +``` + +The modifications should be created in a dedicated PR, and the result of the **Plan** job of the terraform deployment should be analyzed precisely to make sure that the change has been taken into consideration. + +## Deployment of the new protocol parameters + +The update of the new protocol parameters will take place as detailed in the following table: +| Workflow | Deployed at | Effective at +|------------|------------|------------ +| [`.github/workflows/ci.yml`](../../github/workflows/ci.yml) | Merge on `main` branch | **3** epochs later +| [`.github/workflows/pre-release.yml`](../../github/workflows/pre-release.yml) | Pre-release of a distribution | **3** epochs later +| [`.github/workflows/release.yml`](../../github/workflows/release.yml) | Release of a distribution | **3** epochs later + +For more information about the CD, please refer to [Release process and versioning](https://mithril.network/doc/adr/3). \ No newline at end of file diff --git a/docs/runbook/recompute-certificates-hash/README.md b/docs/runbook/recompute-certificates-hash/README.md new file mode 100644 index 00000000000..f85f276bf71 --- /dev/null +++ b/docs/runbook/recompute-certificates-hash/README.md @@ -0,0 +1,92 @@ +# Recompute the certificates hashes of Mithril aggregator + +## Configure environment variables +Export the environment variables: +```bash +export MITHRIL_VM=**MITHRIL_VM** +export CARDANO_NETWORK=**CARDANO_NETWORK** +``` + +Here is an exmaple for the `release-mainnet` network: +```bash +export MITHRIL_VM=aggregator.release-mainnet.api.mithril.network +export CARDANO_NETWORK=mainnet +``` + +## Make a backup of the aggregator database + +Connect to the aggregator VM: +```bash +ssh curry@$MITHRIL_VM +``` + +Once connected to the aggregator VM, export the environment variables: +```bash +export CARDANO_NETWORK=**CARDANO_NETWORK** +``` + +And copy the SQLite database file `aggregator.sqlite3`: +```bash +cp /home/curry/data/$CARDANO_NETWORK/mithril-aggregator/mithril/stores/aggregator.sqlite3 cp /home/curry/data/$CARDANO_NETWORK/mithril-aggregator/mithril/stores/aggregator.sqlite3.bak.$(date +%Y-%m-%d) +``` + +And connect to the aggregator container: +```bash +docker exec -it mithril-aggregator bash +``` + +Once connected to the aggregator container, recompute the certificates hashes: +```bash +/app/bin/mithril-aggregator -vvv tools recompute-certificates-hash +``` + +Then disconnect from the aggregator container: +```bash +exit +``` + +## Restart the aggregator + +Restart the aggregator to make sure that the certificate chain is valid: +```bash +docker restart mithril-aggregator +``` + +Make sure that the certificate chain is valid (wait for the state machiene to go into the state `READY`): +```bash +docker logs -f --tail 1000 mithril-aggregator +``` + +Then disconnect from the aggregator VM: +```bash +exit +``` + +## Rollback procedure + +If the recomputation fails, you can rollback the database. + +First, stop the aggregator: +```bash +docker stop mithril-aggregator +``` + +Then, restore the backed up database: +```bash +cp /home/curry/data/$CARDANO_NETWORK/mithril-aggregator/mithril/stores/aggregator.sqlite3.sqlite3.bak.$(date +%Y-%m-%d) cp /home/curry/data/$CARDANO_NETWORK/mithril-aggregator/mithril/stores/aggregator +``` + +Then, start the aggregator: +```bash +docker start mithril-aggregator +``` + +Make sure that the certificate chain is valid (wait for the state machiene to go into the state `READY`): +```bash +docker logs -f --tail 1000 mithril-aggregator +``` + +Then disconnect from the aggregator VM: +```bash +exit +``` \ No newline at end of file diff --git a/mithril-aggregator/utils/monitoring/README.md b/docs/runbook/registrations-monitoring/README.md similarity index 87% rename from mithril-aggregator/utils/monitoring/README.md rename to docs/runbook/registrations-monitoring/README.md index dc9f7990af9..fa0b57452e0 100644 --- a/mithril-aggregator/utils/monitoring/README.md +++ b/docs/runbook/registrations-monitoring/README.md @@ -9,7 +9,7 @@ query for that. ```sh $> sqlite3 -table -batch \ $DATA_STORES_DIRECTORY/monitoring.sqlite3 \ - < mithril-aggregator/utils/monitoring/stake_signer_version.sql + < stake_signer_version.sql ``` The variable `$DATA_STORES_DIRECTORY` should point to the directory where the diff --git a/mithril-aggregator/utils/monitoring/stake_signer_version.sql b/docs/runbook/registrations-monitoring/stake_signer_version.sql similarity index 100% rename from mithril-aggregator/utils/monitoring/stake_signer_version.sql rename to docs/runbook/registrations-monitoring/stake_signer_version.sql diff --git a/docs/runbook/ssh-access/README.md b/docs/runbook/ssh-access/README.md new file mode 100644 index 00000000000..4e969a3488e --- /dev/null +++ b/docs/runbook/ssh-access/README.md @@ -0,0 +1,51 @@ +# Manage SSH access to infrastructure + +## Add access to a user + +### Create a SSH keypair for a user (if needed) + +Create a new SSH keypair, with `ed25519` cryptography for maximum security: +```bash +ssh-keygen -t ed25519 -C "your_email@example.com" +``` + +Then, add your keypair to the ssh-agent: +```bash +ssh-add ~/.ssh/id_ed25519 +``` + +### Retrieve the public key of your SSH keypair + +Run the following command to retrieve your public key: +```bash +cat ~/.ssh/id_ed25519.pub +``` + +### Declare the public key + +Add a line with the format `**REMOTE_USER**:*PUBLIC_KEY**` in the `mithril-infra/assets/ssh_keys` file for each: +```bash +echo "curry:ssh-ed25519 AAAE53AC3NzQ2vlZDI1aC1O4CpX+S2y1X9NTB4rv4k3pAAAAIF3b7L9sPV5ZiGgogmko your_email@example.com" >> **REPOSITORY_PATH**/mithril-infra/assets/ssh_keys +``` + +Then, create a PR with the updated `ssh_keys` file. + +## Remove access to a user + +To remove an access, simply remove the line(s) related to this user. + +Then, create a PR with the updated `ssh_keys` file. + +## When are the modifications applied? + +The modifications will be applied the next time the terraform deployment is done: +- next **merge** in `main` branch for `testing-preview` +- next **pre-release** created for `pre-release-preview` +- next **release** created for `release-preprod` +- next **release** created for `release-mainnet` + +When the modifications are applied, the VM is updated in place by terraform. + +:warning: In case of emergency, the SSH keys can be modified by an administrator: +- In GCP [**Compute Engine**](https://console.cloud.google.com/compute/instances) +- The SSH keys can be edited in the targeted VM(s) \ No newline at end of file diff --git a/docs/runbook/terraform-lock/README.md b/docs/runbook/terraform-lock/README.md new file mode 100644 index 00000000000..ac6a8d99524 --- /dev/null +++ b/docs/runbook/terraform-lock/README.md @@ -0,0 +1,25 @@ +# Fix terraform deployment lock + +## Introduction + +When the CI cancels a job that is in the middle of a terraform deployment, there is a chance that the lock file used by terraform under the hood to avoid concurrent deployment is not removed. In that cas, the next time a CI job tries to deploy, it will receive an error stating that there is a lock that prevents the deployment to be operated. + +## Find the workflow used to deploy a Mithril network + +Currently, the following [Mithril networks](https://mithril.network/doc/manual/developer-docs/references#mithril-networks) are generally available, and deployed with `terraform`: +- `testing-preview`: with the workflow [`.github/workflows/ci.yml`](../../github/workflows/ci.yml) +- `pre-release-preview`: with the workflow [`.github/workflows/pre-release.yml`](../../github/workflows/pre-release.yml) +- `release-preprod`: with the workflow [`.github/workflows/release.yml`](../../github/workflows/release.yml) +- `release-mainnet`: with the workflow [`.github/workflows/release.yml`](../../github/workflows/release.yml) + + +## Identify the terraform backend bucket + In the workflow file, there is a `terraform_backend_bucket` that details the GCP bucket that is used by terraform to store the state of the deployment. + +## Reset the terraform lock + +A user with administrator rights can simply remove the lock file: +- In GCP [**Cloud Storage**](https://console.cloud.google.com/storage/browser) +- In the terraform administration bucket that you have identified earlier, the file that needs to be removed is at path `**TERRAFORM_BACKEND_BUCKET**/terraform/mithril-**MITHRIL_NETWORK_IDENTIFIER**/.terraform.lock.hcl` (e.g. `mithril-terraform-prod/terraform/mithril-release-mainnet/terraform.lock.hcl`) + +:warning: never delete/modify the `**TERRAFORM_BACKEND_BUCKET**/terraform/mithril-**MITHRIL_NETWORK_IDENTIFIER**/default.tfstate` file. \ No newline at end of file