Skip to content

Logbook 2022 H2

DJO edited this page Aug 17, 2023 · 4 revisions

Newer Entries

December 2022

2022-12-22 🎅

Mithril session

  • We have talked about the issue Upgrade Cardano devnet to 1.35.4 #523. The upgrade to the latest version of the Cardano node has introduced flakiness in the end to end test. We are currently working on fine tuning the genesis block of the devnet to fix these hiccups. We have also talked about the usage of a custom environment variable that will allow us to update the url where the cardano node is downloaded without modifying the workflow

  • We have paired and merged the issue Refactoring Crypto test helpers #663:

  • We have discussed about how we could remove the 'allow_non_certified_registration' feature and completely remove the uncertified part of the code. In order to do this, we need to investigate how we can dodge the spoofing of the Pool Ids from the signers nodes when we want to simulate stress tests in as close as possible conditions as in mainnet (i.e. 3K+ SPOs and 100GB+ database). We will work on this subject shortly

2022-12-21 ❄️

Mithril session

  • We have paired on the redaction of a document that prepares our work for Handling Graceful Updates on Mithril Network:

    • We have raised many questions that we need to answer
    • We will proceed with the redaction of an ADR
    • We will PoC:
      • Interaction with the Cardano chain (to activate a new version): read & write transactions
      • Handle backward compatibility of API messages (with protobuf, AVRO, in house development etc.)
    • Once these steps completed, we will move forward with the implementation
  • We have continued pairing on the issue Refactoring Crypto test helpers #663 for which a PR should be ready shortly

2022-12-20

Mithril session

  • We have paired and merged the issue Deactivate uncertified signer registration #621:

    • We have fixed the difficulties we faced yesterday regarding the usage of the Rust features when artifacts are built from the workspace. For this we have removed the usage of a feature flag that must be activated only on one crate: it must be activated for all at once. In our case, we have decided to simply not use one anymore and it lead us to refactor the protocol demo tool and make it use its own types (including direct access to mithril-stmtypes in order to keep it chain agnostic)
    • We have also deactivated the uncertified signers from the Mithril networks
  • We have paired on the issue Refactoring Crypto test helpers #663 and we have started implementing a PR. We will continue working on it tomorrow

  • We have also discussed about the way to implement the upgrade strategy we have talked about yesterday during our team session

  • Finally, we have created an issue Add context to errors #665 in which we will try to provide better debugging information by adding context to errors and by providing less technical error messages

2022-12-19

Mithril session

  • We have closed the following issues and PRs:

  • We have created a new issue Refactoring Crypto test helpers #663 to refactor the cryptographic test helpers used in the tests to provide easy access to protocol ready to use signers (key registration with Cardano certification, certificate chain, ...)

  • We have also paired on an issue with the PR Decommission signer registration with declarative PoolId #653 for which tests that were broken locally were still succeeding on the CI. After investigating the cache, we verified that they were not the source of the problem. The problem is related to the usage of features in the context of Rust workspace (and feature unification): when we build (or test) by calling cargo command from the root of the workspace, the features used are different that the ones used if we use the command from the crate directory. We actually were building tests and release binaries with unwanted features. We will think about how to solve this issue in the following days as no perfect solution seem to exist and probably create an ADR to set rules on how to use features in the future to avoid this pitfall

  • During the team session, we discussed about:

    • How to handle upgrades of the signer as smooth as possible when we reach mainnet:
      • We must limit the usage of the re-genesis of the certificate chain to the strict necessary
      • When a new version of the signer is released we need to reach the quorum at least once per epoch. This means that we can't afford to have the signers split in 2 populations that would not be able to create multi signature
      • We will adopt a strategy that is close to the one used by Cardano: the idea is to deploy silently a new "big" version that gets activated once the deployment of the version is high enough (a la hard fork). This means that we need to monitor the deployment by using for example the single signatures that are regularly sent to the aggregator
      • We will use a transaction on chain that will be read by the signer nodes to proceed to a synchronous upgrade
      • Also, we will work in order to provide backward compatibility for "small" model updates:
        • We need to version all the messages exchanged (protocol version + agent version)
        • We need to provide golden tests to make sure that we can handle previous versions of the models in the newer versions
    • We have decided to postpone the work on issue Add Stake Shares in Certificate #636 as we are not completely ready to move forward on this subject

2022-12-16

Mithril session

  • We have reviewed and merged the PR:

  • We have reviewed the final adjustements to the PR Optimize snapshot digest computation #652 and talked about the robustness of the timed tests if we compile them for release (where the optimization is less obvious on small files). It should be merged very shortly

  • We have talked about some CI improvements that we need to address:

    • Find a way to optimize the use of the cache as we have a hard limit of 10GB that is reached very often and that leads to higher computation delays of the Rust jobs
    • Find a way to add more tags to an existing Docker image on the registry instead of rebuilding them from scratch for Pre-Release and Release
  • We have also created a new issue Delete test lab monitor #658 to clean the code base and to avoid having come build issues for some SPOs

  • Finally, we have released a new distribution 2250.1 💪

2022-12-15

Mithril session

2022-12-14

Mithril session

2022-12-13

Mithril session

  • We have started working on moving toward mainnet. We have tried to assess the subjects that need to be addressed first:
    • The storage of the keys & signatures is currently done with a hex encoding in the database stores (especially in the certificate chain) and in the messages exchanged by the nodes, and also in the Genesis verification key file for tests. We should be ready to handle multiple types of encoding in order to:
      • Avoid breaking changes (e.g. not being able to validate the certificate chain after a change of encoding)
      • Optimize the size of the data (e.g. the size of a certificate) (this should benchmarked)
      • The solution that we have identified is to create a codec that would be able to:
        • Serialize in the default (or a specific) encoding (which can evolve in the future)
        • Deserialize the data by attempting to parse a list of maintained decoding formats
    • Activate the Mithril nodes only when the attached Cardano node is (almost) fully synced (threshold to be determined). This will avoid unnecessary computations when they are not appropriate (e.g. compute stake distribution, snapshot digest and archive)
    • Separate the objects used for communication between the nodes and the business objects they use
    • We have also discussed about adaptations that will be needed in order to handle new types of certified data (not final):
      • Associate a type to the certificates so that they can represent accurately certified data
      • Make the signer sign 2 messages for each signing round (the next stake distribution and the message associated with the signing round)
      • Let the aggregator select which message it needs to aggregate first (the next stake distribution if it has not already created a certificate for the epoch, the message of the signing round otherwise). This could also be an efficient strategy in a decentralized context
    • We will keep thinking on other features and we will also need to get a share of the iteration velocity dedicated to refactoring/technical debt

2022-12-12

Mithril session

  • We have reviewed the drafts implementations of:

  • We have merged the issue Remove VerificationKey and Stake from individual signature #619. As there are some breaking changes on the encoding of the multi-signatures, we are compelled to proceed to a re-genesis of the certificate chains of the Mithril networks:

    • We have defined a short-term plan (to be reproduced whenever we have a re-genesis on the tests networks):
      • testing-preview re-genesis has been done. New certificates should show up tomorrow
      • pre-release-preview re-genesis scheduled on Wednesday with new distribution pre-release. New certificates should be up on Thursday
      • release-preprod re-genesis scheduled on Friday with new distribution release. New certificates should be up on Sunday
      • Communications will be done with SPOs on the discord channel when we proceed to re-genesis of pre-release-preview and release-preprod
    • We have also upgraded the version of mithril-stm to 0.2.0
    • We have also talked about how we could handle the breaking changes in mithril-stm in the future:
      • when working on test networks, we simply re-genesis the certificate chain
      • when working on mainnet in beta version (when we have not reached a high enough adoption rate), we simply re-genesis the certificate chain
      • when working on mainnet: no more breaking changes, which means that the library should take care of handling compatibility as in other Cardano cryptographic libraries. The idea that we had to embed multiple versions of the library is not acceptable because of the high risk of embedding security vulnerabilities
  • We also have paired on the Extract the signer registration from multi-signer #642. We have extracted the signer registration responsibility to a Signer Registerer module last week, which we have wired to the HTTP server and the state machine of the Aggregator. The last step will be to clean the multi-signer

  • Our team session has mainly been dedicated to discussing about the Security Indicator of the certificates:

    • Maybe we just need an "Unsafe" warning to be displayed in the UX (explorer and client) when the security is not full
    • We could only rely on the percentage of stakes for this (as long as the full security protocol parameters are used)
    • Using the signers list of the certificate might not be enough to guarantee security by checking that a well-known signer (or multiple) are listed. We could probably embed this list in the message that is signed, but this would only be interesting while we have not reached the 90% threshold of participation rate
    • An important information is the adoption rate for which we could provide an evolution graph in the explorer
    • Another idea, would be to have an external process (IOG hosted) that continuously checks the validity of the certificate chain produced by the aggregator, and in case of discrepancy with the actual Cardano chain, would revoke the genesis verification key used by clients to prevent them from restoring the snapshots
    • We have agreed that we will add "Security" page to the documentation website that will explain how the ramp up (aka beta) phase on the mainnet will work and what security will be provided. We will dedicate a team session to the redaction of this page.

2022-12-09

Mithril session

  • We have reviewed the code in progress and discussed about the issue Optimize Snapshot Digest Computation #510:

    • We have decided to use a CacheProvider trait the will be responsible to provide cache of the immutable files given its (their) Immutable File Number
    • This will allow us to provide the following implementations:
      • In memory at first, for being able to provide a minimal working implementation (for testing and that could also be used in the Client)
      • In memory with state stored in the SQLite database (for Signer and Aggregator nodes that already have a store)
      • In memory with state stored in a file with JSON format (that could used in the Client)
    • We still wonder how we can test the trait efficiently:
      • Use a mock to test behavior of the digester
      • Benchmark the time gained with/without cache
      • Maybe both approaches should be implemented
  • We have also prepared the issue Deactivate uncertified signer registration #621 by deploying tests SPOs on the pre-release-preview and release-preprod that will be able to sign in 2 epochs and that should thus be ready when we decommission the declarative signer registration

2022-12-08

Mithril session

  • We have reviewed and merged the issue Add signature of binaries in the artifacts released #587. This was the last issue of the epic issue Implement Release process #500 that is now finalized 💪 🎉

  • We have continued pairing on the issue Extract the signer registration from multi-signer #642 and we will keep our pairing sessions on the issue Simplify the Multi Signer in Aggregator #398 next week

  • We have taken some time to debug the PR check API version #641 for which the test end to end is always failing

  • Finally we have started designing a consistent way of handling compatibility between the Mithril nodes:

    • We want to deal as efficiently as possible with situations where:
      • We are introducing breaking changes that make nodes versions incompatible (avoid them if backward compatibility is possible or provide a way to dodge them. This is critical as we will need to get a very high level of participation of SPOs in order to provide full security for the certificates and also to avoid epoch gaps in the certificate chain)
      • We are introducing breaking changes that make validation of a part of the certificate chain impossible (new version of nodes would not be able to validate previously generated certificates and reciprocally)
    • We will create an ADR once our design is final
    • Here some ideas that have talked about:
      • We could use multiple versions of the mithril-stm crate and switch to the correct version to proceed to the certificate verification depending on the version embedded in the certificate. This solution is interesting but has some caveats: it is a bit cumbersome and raise questions on how to handle security issues that would be fixed in recent versions only for example. We will probably try to PoC this solution soon.
      • We could use a shift mechanism that would activate versions later at a defined epoch transition: we would embed 2 versions (current + next) in the nodes and make an announcement to the SPOs that a new critical version must be installed before the epoch transition. This would give time to upgrade the signers and maximize our chances to avoid epoch gaps. This would also be a convenient way to prepare for new use cases that involve new types of data to certify. We will probably try to PoC this solution soon.
    • We need to make some adjustments on the way we handle the detection of incompatible versions of the nodes:
      • Our current MITHRIL_API_VERSION that is the OpenAPI specification version does not fully reflect incompatibility between nodes which can occur when the content of fields of the data exchanged are modified (e.g. in Optimize Snapshot Digest Computation #510 where the way digest are computed changes or in Remove VerificationKey and Stake from individual signature #619 where single and multi-signatures formats change)
      • We could extend the "meaning" of the MITHRIL_API_VERSION version that would be updated when:
        • OpenAPI specification is updated
        • Encoding or values computation is modified
        • Breaking changes in the certificate chain occur (such that a version of the node is not able to validate it completely)
      • We could rely on the crates nodes versions to establish compatibility tables (e.g. this version of the aggregator is compatible with these versions of the signer node and these versions of the client node)
      • We could also rely on a baked minimum version of the distribution acceptable for a given node (e.g. aggregator running 2248.1 is compatible with signer not older than 2244 distribution)
      • Some drawbacks exist with all the solutions. Relying on the distribution looks interesting even though it will more work

2022-12-07

Mithril session

2022-12-06

Mithril session

2022-12-05

Mithril session

2022-12-01

Mithril session

  • We have prepared the demo path of this iteration:

    1. Introduction
    2. Presentation of the first draft of the "CIP Mithril Decentralized Network"
    3. Showcase of the Store Automatic Migration second milestone for Signer and Aggregator
    4. Video demo of benchmark bootstrap of Daedalus on mainnet with/without Mithril
    5. Finalization/optimizations of the release process
    6. Announcement of deprecation of declarative Pool ID signer registration and next steps
    7. Conclusion/Next steps
    8. QA
  • We have prepared the pre-release of the next distribution: 2248.0-prerelease. It is currently tested and should be released tomorrow

  • We have also been working on the issue CI does not trigger for PR from forks #597. We are now able to run correctly the CI for a PR that comes from a fork. We agreed that it could be a good idea to separate the CI workflow in 2 parts and putting the Docker build/push and Terraform deployment steps in a new Testing workflow

November 2022

2022-11-30

Mithril session

  • We have created the following issues:

  • We have discussed about the CI does not trigger for PR from forks #597 which is very tricky. We have decided to rollback the trigger of the artifacts recording, Docker registry, Terraform deployments on the CI only when there is a push on the main branch. In other cases, only the build and testing part will run. This means that we will have to create tags for new distributions on commits merged by collaborators of the repository. We will investigate further and try to find a better option. We aso have had many difficulties with the CI being very slow for the last few days with some delays of more than 2 hours

  • The issue Implement Mithril SPO on testing/pre-release environments #563 has been merged and some tests SPOs are being setup on the testing-preview network

  • We have also reviewed and merged the PR make SQL entities to create their projection #625

  • We have paired on the issue Prepare CIP/CPS for Mithril piggybacked on Cardano network #586

    • We made final adjustments of the lately redacted parts Abstract, Motivation, Specification/Overview, Rationale, Path to Active and Further Reading
    • We had a meeting with researchers regarding the issue that we ave on achieving consensus on the signer registration:
      • The best option that we have at this time is to make a transaction on chain to reach the consensus (for every signer registration at each epoch)
      • We could probably have a KES like evolution mechanism for the Mithril keys in order to reduce the transaction frequency at once every few epochs
      • Researchers will keep on reviewing our DIP draft and trying to find other solutions

2022-11-29

Mithril session

  • We have reviewed the PR Add Mithril SPO on testing/pre-release environments #589 that will be ready to merge shortly after the documentation is updated. It will allow the creation/maintenance of SPOs on the Mithril test networks

  • We have reviewed and paired on the SQL automatic migration #600 that has been merged and will be embedded in the next distribution 2248

  • We have also reviewed and merged the Add versioning to documentation #555 issue that separates the documentation website in 2 separate versions (accessible via the drop-down top right menu on the website):

    • Current version: that has been merged with the latest distribution
    • Next version: the under construction version that will be shipped with the next distribution
  • We have paired on the CI does not trigger for PR from forks #597 for which we are still having some troubles with the management of the build caches. We will keep on investigating on this issue in the next days

  • We have continued working on the issue Prepare CIP/CPS for Mithril piggybacked on Cardano network #586 which is close to get in a decent first draft status. In the next days, we will:

    • Make a full review of the document
    • Enhance the schema overview to make it closer to the final specifications
    • Enhance the description of the handling of the several aggregators certificate chains (regarding the genesis certificate) in this decentralized setup
    • Work on dedicated sessions with researchers in order to find answers and solutions to the signer registration consensus problem that we have identified
  • Regarding the publication of the mithril-stm crypto library to crates.io, we will proceed as follow:

    • First publish the crate with a crates.io API Token from Inigo
    • He will then invite other members of the team as co-owners of the crate
    • Finally, a team will be created in the IOHG GitHub organization that will also be added as owner of the crate (name of the team to be confirmed, e.g. Core, Crypto, Rust, Mithril, and will depend on the strategy defined regarding grouping of the published crates)

2022-11-28

Mithril session

  • We have talked about the issue CI does not trigger for PR from forks #597. We will probably have to trigger the CI only when a PR is created/updated/merged in order to avoid duplicate triggers. We need to make sure that this is not a problem when we retrieve the produced artifacts from other workflows. We will conduct some tests on that matter in the following days

  • We have paired on the issue SQL automatic migration #600 and the associated PR should be ready to merge shortly

  • We have merged the following PRs:

  • Finally, we have paired on the Prepare CIP/CPS for Mithril piggybacked on Cardano network #586:

    • We have reworked all the min protocols to follow the formalism of the Shelley Networking Protocol
    • We have identified a difficulty with the consensus that needs to be reached on the verification keys of the signers when we broadcast the signer registration. We will work on this subject with researchers in the next days to try to find a solution
    • In the mean time, we will complete the redaction of the first draft of the CIP tomorrow in a dedicated session
    • We will also have to create a Mithril CIP in the next future as in CIP-0035. It will commit our team to be fully part of the CIP process

2022-11-25

Mithril session

2022-11-24

Mithril session

  • We have merged the following PRs:

    • Deployment to crates.io #610:
      • We just need to update the final API_TOKEN in the GitHub secrets once we receive it
      • We will wait for a cleanup of the README file of the mithril-stm crates (aka mithril-core) before activating the publication to crates.io
      • When publication tie has come, we will remove the --dry-run argument in the publish step of the Pre-release workflow
    • Add Daedalus/Mithril benchmark video #614 that adds the YouTube video of the benchmark we have done on the mainnet with/without Mihtril. It is accessible on the Bootstrap a Cardano Node guide of the documentation website
  • We have paired on the issue Add nodes/libraries versions matrix in releases #599 and we have merged the PR Produce versions table in Release description #612 that will add a version table in the release description automatically

  • Finally we have paired on the issue Prepare CIP/CPS for Mithril piggybacked on Cardano network #586:

    • We have carefully reviewed the Mithril Signer Protocol part and have made some refinements on it
    • ⚠️ We have identified a tricky issue regarding the signer registration for which we need to find a consensus among the nodes. In order to do so, we could probably use the slot leader to certify (with its VRF keys) the list of signers registered to Mithril for an epoch
    • We have also scheduled a new session tomorrow dedicated at finalizing the specifications of this mini protocol

2022-11-23

Mithril session

  • We have merged a quick fix on Store migration process does not accept a newer version #603 that as blocking the CI. It simply deactivates the panic that occurs when version mismatch is detected. The real fix will come with the issue SQL automatic migration #600

  • We have also paired on the issue Activate deployment to crates.io #588 for which:

    • We have pushed the PR Deployment to crates.io #610 that should be merged shortly
    • We are waiting for the API token of the crates.io account of IOG that will be used to deploy. In the mean time, we have kept a dry-run version of the publication step in the Release workflow
  • Finally, we have paired on the issue Prepare CIP/CPS for Mithril piggybacked on Cardano network #586:

    • We made a full review of the CIP
    • We have agreed that a light summary of the protocol should be added at the beginning of the CIP
    • We still have to properly design the bootstrap of the certificate chain for an aggregator in this decentralized context
    • In order to complete the work during this iteration and to get a first clean version:
      • We will all re-read the document prior to new pairing sessions
      • We will schedule 3 other pairing sessions dedicated at that CIP in the following days

2022-11-22

Mithril session

2022-11-21

Mithril session

2022-11-17

Mithril session

  • We have prepared the demo path of this iteration:

    1. Introduction
    2. Showcase of the Store Automatic Migration first milestone for Signer and Aggregator
    3. Showcase of the enhancements of the Explorer
    4. Showcase of live release of the 2246.1 distribution
    5. Conclusion
    6. QA
  • Showcase path of the Live release of the 2246.1 distribution:

# Demo: Release distribution `2246.1`

## Open pre-release page
google-chrome https://github.com/input-output-hk/mithril/releases/tag/2246.1-prerelease

## Switch to main branch
git switch main
git fetch
git pull --rebase

## Show tag on repository
git log --oneline

## Create final tag
git tag -s 2246.1 0bff212a767399b01aef152e27782a7e7ba934f2 -m "2246.1 release"

## Show tag on repository
git log --oneline

## Push the final tag
git push origin 2246.1

## Open Pre-lease Workflow
google-chrome https://github.com/input-output-hk/mithril/actions/workflows/pre-release.yml

## Open release page
### Generate release notes
### Uncheck "Set as a pre-release"
### Check "Set as the latest release"
google-chrome https://github.com/input-output-hk/mithril/releases/tag/2246.1

## Open Release workflow
google-chrome https://github.com/input-output-hk/mithril/actions/workflows/release.yml

2022-11-16

Mithril session

2022-11-15

Mithril session

  • We have merged the PR add API version in HTTP headers #566 that closes the issue API version #565. The next step is to enforce the compatibility of the nodes and as for update when an incompatibility is detected

  • We have reviewed the final modifications of the PR database migration framework #571 that should be merged shorty. Once this is done, we will work on the automatic upgrade of the stores of the nodes.

  • We also have reviewed, requested some modifications and merged this PR More refined list of pre-reqs #591 coming from the community

  • Following many comments, and some confusion that we have noticed on the discord channel regarding the configuration of the nodes for the several environments, we have merged this PR Enhance Mithril Networks documentation #593 which goal is to provide clear section for the configuration in every guide that requires it. This section is now centralized to provide up to date information efficiently. Also, we have removed all the mentions to the now decommissioned previous infrastructure that used to be accessible on the https://aggregator.api.mithril.network/aggregator endpoint

  • Finally, we have merged the PR Upgrade to Cardano 1.35.4 #595 that uses the latest stable version of the cardano node as the previous 1.35.3 will not be working any more by November 16th

2022-11-14

Mithril session

  • We have talked about issue Implement stores migration process #562 and reviewed the PR database migration framework #571. We have decided to align the version number of the database to the version number of the node. The auto upgrade mechanism will be:

    • Check if the version of the node has changed (from previously recorded in the database state)
    • If the version has changed, select the ordered list of upgrade files that need to be applied to the node
    • For each of these files (associated to a version):
      • Apply the upgrade file (first file)
      • If upgrade went OK, check the upgraded database (second file)
      • If upgrade is checked successfully, record the updated version to the database
    • Once all the upgrades have been applied, record the current version of the application and the last updated date
    • There are 2 special cases:
      • Table creation, for which a first upgrade will be a CREATE IF NOT EXISTS query
      • If list of upgrades to apply includes a version lower that the currently recorded version, for which a panic and error message should happen
  • We have also paired on the issue API version #565 for which we have added the Mithril API Version in the headers of the calls made to the Aggregator from the Signer/Client

  • Finally, we have continued pairing on the issue Prepare CIP/CPS for Mithril piggybacked on Cardano network #586. We will do another dedicated session with the whole team this week

2022-11-10

Mithril session

2022-11-08

Mithril session

2022-11-07

Mithril session

2022-11-04

Mithril session

  • We have reviewed the work in progress on the Mithril explorer for issue Provide a 'copy' button for the aggregator URL in explorer #576. The associated PR should be ready to merge shortly

  • We have identified some problems on the testing-preview and pre-release-preview networks that were not producing snapshot for epoch 10. Apparently some problems may exist in the fast bootstrap genesis tools/process. We are investigating the problem. In the mean time we have:

    • Reset the testing-preview network with fast bootstrap genesis: new certificates are produced and no epoch gap with protocol initializers/verification keys exist in the databases of the signer and aggregator nodes. We will see if the problem occurs again in the following epochs
    • Re-genesis the pre-release-preview network (as fast genesis is not possible anymore once new signers have registered). New certificates should be produced in the next epoch
  • Following the release of Rust 1.65.0 some clippy warnings occurred in the CI and were blocking the process. We have paired to apply a fix for these warnings in Update rust dependencies #583

2022-11-03

Mithril session

  • The first distribution of Mithril has been released 2244.0 🚀 🎉 💪

  • We have paired and merged the PR Add Debian packaging to CI #579 producing the debian packages for the installation of the Linux binaries in the CI. We will adjust the documentation to make this installation the preferred installation type for Mithril nodes

  • We have worked on the demo path of this iteration:

    1. Introduction
    2. Showcase of the new release process and of the first Mithril distribution
    3. Presentation of single signature without Merkle path
    4. Conclusion
    5. Q&A
  • Showcase path of the new release process and of the first Mithril distribution:

# Demo: Bootstrap a Cardano node from a preprod Mithril snapshot with latest Client distribution

## Download binary
rm -f mithril-client
wget https://github.com/input-output-hk/mithril/releases/download/2244.0/mithril-client_0.1.0.12bb705_amd64.deb
sudo dpkg -x mithril-client_0.1.0.12bb705_amd64.deb .
sudo mv usr/bin/mithril-client ./mithril-client

## Test installation
./mithril-client
./mithril-client --version

## Get Latest Snapshot Digest
export NETWORK=preprod
export AGGREGATOR_ENDPOINT=https://aggregator.release-preprod.api.mithril.network/aggregator
export GENESIS_VERIFICATION_KEY=$(wget -q -O - https://raw.githubusercontent.com/input-output-hk/mithril/main/TEST_ONLY_genesis.vkey)

SNAPSHOT_DIGEST=$(curl -s $AGGREGATOR_ENDPOINT/snapshots | jq -r '.[0].digest')
echo $SNAPSHOT_DIGEST

## List Snapshots
./mithril-client list

## Show Latest Snapshot
./mithril-client show $SNAPSHOT_DIGEST

## Download Latest Snapshot
./mithril-client download $SNAPSHOT_DIGEST

## Restore Latest Snapshot
./mithril-client restore $SNAPSHOT_DIGEST

## Launch a Cardano Node
docker run -v $(pwd)/ipc:/ipc -v cardano-node-data:/data --mount type=bind,source="$(pwd)/data/preprod/$SNAPSHOT_DIGEST/db",target=/data/db/ -e NETWORK=preprod inputoutput/cardano-node:1.35.3-configs

## Query tip of the chain
watch -n 1 "sudo CARDANO_NODE_SOCKET_PATH=./ipc/node.socket ./cardano-cli query tip --cardano-mode --testnet-magic 1 | jq ."

2022-11-02

Mithril session

  • We have paired on fixing the tests not working with the PR Single signature without merkle path #484. The PR is now merged, and the release-preprod environment has been accordingly re-genesis (as the AVK format is not anymore compatible) 🎉

  • We have merged the PR Activate new Mithril networks #577 that activates the new Mithril networks for each workflow of the new release process:

Mithril Network Workflow
testing-preview CI
pre-release-preview Pre-Release
release-preprod Release
  • Tomorrow we will create the first distribution release of the repository. We ave discussed about this first release and nice to have features to implement shortly in the distribution:
    • Debian package
    • GPG signature of the binaries
    • Better handling of Docker artifacts re-tagging
    • Manual testing of Client artifacts for macOS and Windows platforms

October 2022

2022-10-28

Mithril session

  • We have worked toward releasing the new release-preprod environment:

    • ✔️ Deprecate current aggregator: it will not be updated anymore when some branches are merged on main
    • ✔️ Use release-preprod as the new environment that is deployed when branches are merged on main (temporarily, until the new preview cardano testnet is re-spun)
    • ❌ Merge breaking changes of mithril-core in the PR Single signature without merkle path #484. A blocking issue forced us to postpone the merge until a fix is implemented.
    • ⌚ Fast re-genesis the aggregator of release-preprod (<30 min). Will be done after the #484 merge.
    • ⌚ Communicate with SPOs on discord and dev blog about the new & deprecated environments. A blog post has been created and is under review in PR New environments documentation #575
  • We have paired on fixing the tests of the Aggregator of the PR Single signature without merkle path #484. We did not succeed, but we found out that there is probably an issue with the registration. We will keep on investigating this problem.

  • We have also discussed about how we could test that the macOS and Windows Client builds are running correctly when connected to an Aggregator that runs on Linux. We think that a good option is to create manually triggered complimentary pipelines. We will try to investigate this shortly.

  • Finally, we have reviewed the work in progress on the issue Implement stores migration process #562

2022-10-27

Mithril session

2022-10-26

Mithril session

  • We have discussed about the issue Implement stores migration process #562:

    • This issue is closely linked to Move stores to relational design with SQLite #476. We will start working on it once #562 is completed
    • We have agreed that it would be easier to release a first version of the system that is already handling the migration steps described by sequential migration files
  • We have reviewed, paired and merged the PR Adapt ci workflow to Release Process #557 🎉:

    • There is small bug regarding naming of artifacs
    • We still need to have the CI append the commit sha in the versions of the cargo.toml files
    • We need to find a way to reuse docker artifacts between pipelines
    • We must make some tests on the macOS and Windows client binaries to make sure they are working properly
  • The following PRs have been also merged:

  • Finally, we have decided to spin-up the release-preprod environment at the EOW:

    • After merge of the breaking change PR Single signature without merkle path #484 or re-spin after this merge)
    • Temporarily implement it in the CI pipeline (and then moving it to the Release pipeline when it is released)
    • Communicate with the SPOs on discord and dev blog:
      • Explain that the current Aggregator running on preview is deprecated and will be decommissioned Nov, 1st
      • Explain that they need to move their Signer nodes to release-preprod environment which will be the stable environment
      • Encourage them to also have a Signer node running on the pre-release-preview environment to keep participating in the testing effort

2022-10-25

Mithril session

2022-10-24

Mithril session

2022-10-21

Mithril session

  • We have fixed the issue with cargo sort that was crashing the CI with PR Cargo update sort and dependencies #558

  • We have also reviewed and merged the PR Add version information #553

  • The issue that we have with the Signer registration (as in issue Signer registration fails with key certification mode #548) seems to be related to the fact that KES Secret Keys evolves in memory. This explains why we can verify the signature only with a 0 value for the KES Period. In order to fix the problem some solutions exist:

    • Compute the correct KES Period when doing the signature of the Mithril Verification Key (current_period - start_period, current_period given by the cardano cli and start_period given by the Operational Certificate). We will pair on this solution next week
    • Update the cardano cli so that it computes the signature with the in memory KES Secret Key. We expect an estimate from the Cardano node team for this feature

2022-10-20

Mithril session

  • We have worked on the demo path of this iteration:

    1. Introduction
    2. Presentation of the results of the SPO certification on the hosted Aggregator
    3. Presentation of the release process updated
    4. Showcase of the CI/CD Workflows: Testing -> Pre-Release -> Release
    5. Showcase of the bootstrapping of a deployment environment on preview
    6. Conclusion
    7. Q&A
  • Showcase path of the bootstrapping of a deployment environment on preview:

# Mithril Bootstrap Deployment Environment
# On preview network

---

# Setup demo

## Download source (if needed)
git clone https://github.com/input-output-hk/mithril.git
#or
git clone [email protected]:input-output-hk/mithril.git

---

# Demo: Bootstrap Deployment Environment 

## Change directory
cd mithril/mithril-infra

## Setup environment variables
DEPLOY_ENVIRONMENT=demo-preview
API_DOMAIN=api.mithril.network


## Setup terraform variables
cat > env.$DEPLOY_ENVIRONMENT.tfvars << EOF
environment_prefix                   = "demo"
environment_suffix                   = ""
cardano_network                      = "preview"
google_project                       = "mithril-test-365514"
google_region                        = "europe-west1"
google_zone                          = "europe-west1-b"
google_machine_type                  = "e2-medium"
google_service_credentials_json      = "../gcp-credentials.json"
google_application_credentials_json  = ""
mithril_api_domain                   = "$API_DOMAIN"
mithril_image_id                     = "latest"
mithril_genesis_verification_key_url = "https://raw.githubusercontent.com/input-output-hk/mithril/main/TEST_ONLY_genesis.vkey"
mithril_genesis_secret_key           = ""
mithril_signers = {
  "1" = {
    pool_id = "pool15qde6mnkc0jgycm69ua0grwxmmu0tke54h5uhml0j8ndw3kcu9x",
  },
  "2" = {
    pool_id = "pool10g0tvpyc3phkym8r6hamdulyzd6shzjldpahyvdkljl7ur2adfe",
  }
}
EOF

## Create & init terraform workspace
terraform workspace new $DEPLOY_ENVIRONMENT
terraform init

## Plan terraform deployment
terraform plan --var-file=env.$DEPLOY_ENVIRONMENT.tfvars

## Apply terraform deployment
terraform apply --var-file=env.$DEPLOY_ENVIRONMENT.tfvars

## Connect to VM and list docker containers
ssh [email protected].$API_DOMAIN -- docker ps
ssh [email protected].$API_DOMAIN -- tree /home/curry/data

## Query aggregator REST API
curl -sk https://aggregator.demo-preview.$API_DOMAIN/aggregator/epoch-settings | jq .
watch -n 1 "curl -sk https://aggregator.demo-preview.$API_DOMAIN/aggregator/epoch-settings | jq ."

## Destroy terraform deployment
terraform destroy --var-file=env.$DEPLOY_ENVIRONMENT.tfvars
ssh-keygen -f "/home/jp/.ssh/known_hosts" -R "aggregator.demo-preview.$API_DOMAIN"
rm -f env.$DEPLOY_ENVIRONMENT.tfvars
rm -rf .terraform
rm -rf terraform.tfstate.d
rm -f .terraform.lock.hcl

2022-10-19

Mithril session

2022-10-18

Mithril session

2022-10-17

Mithril session

  • We had talks about issue Get/Show current version on Mithril nodes cli / APIs #541. We agreed that:

    • The Client/Signer nodes should expose the version they run in the headers when requesting the Aggregator
    • The Aggregator node should expose the version it runs in a header when it is called
    • We could implement a version check system that returns an error message stating an update is required if the Aggregator version is not compatible with the Client/Signer version
  • We also discussed about the issue Implement Release process #500:

    • Adapt CI workflows to work with the new release process #543: In progress, has been tested in a temporary repository and new workflows will be added soon
    • A first task of extracting the documentation generation in a separate workflow is in progress
    • We think that we may need to handle the documentation a bit differently than the rest of the process:
      • We need to produce new dev blog posts without releasing a new version
      • We could use the versioning feature of docusaurus and publish pre-release/release versions on the same website
      • This would require some manual operations on developers end
      • There are advantages and drawbacks on this approach. Well keep on improving the design of this part of the process
  • We have investigated the case of a SPO which was unable to get the Verified Signer badge. It appears that his pool Id was spoofed by one of the Signer nodes running on the GCP platform. We have fixed and merged the PR Fix mithril infra configuration #554

  • We have talked more about the release process during the team session:

    • Regarding the versions management of the versions, we have worked on several ideas:
      • We could have an hybrid version where the major+minor would be handled by the cargo.toml version and the patch would be handled remotely in a "directory" of all versions
      • We could also handle the full version in a remote directory
      • We could package the version in an external file dedicated to the versioning and that would be embedded in the GitHub package
      • Another idea that looks simpler is to add a patch identifier that reflects the commit id like in -{COMMIT_SHA1} for example
    • Regarding the documentation website:
      • There will be only one version of the website that is deployed when a merge occurs on the main branch
      • The website will support 2 versions: current and next
      • We will create a commit post release that will update the current and next versions and also the versions in the cargo.toml file(s)

Meeting with Alex Sierkov / Daedalus Turbo project

We had an introductory call today with Alex and the Mithril team. After some presentations, we went through the current state of Mithril and the short term roadmap, emphasizing our current target is to address the specific need of fast bootstrap of a full node.

Alex asked some questions about the roadmap:

  • What do we think of distributing data using "alternative" to HTTP?

    We think this is a good idea, we made room for it in the snapshot's schema, and we did not tackle it for want of time and because it seems something that can be contributed later

  • What's the plan for deploying to mainnet and how much stake do you need?

    We don't know exactly yet, one idea would be to grade the signatures according to the amount of stake while we ramp up. Beside, signers are known so that's also a possible source of trust

  • How about speeding up client's state reconstruction process through some form of indexing (eg. think SPV + Bloom filter)?

    That's something we explored briefly in the initial prototype phase. We want to make mithril "extensible" in the sense that SPO could sign various artifacts beside the node's db, which could make this feature possible

  • What about the use case of a node/wallet catching up on a few months of activity?

    Right now we "naively" sign and store full snapshots but obviously we want to chunk those for download and snapshot signing performance reasons

We agreed on these follow-up items:

  • Answer any question Alex has on the dedicated discord channel (#moria of course)
  • Alex is most welcomed to attend the bi-weekly demo/Q&A session.

    If need be we are comfortable with the idea of "Mithril Office Hours" on a weekly basis should the community feels a need for it

  • @Reza will be main contact point with the team when it comes to discussing features and roadmap

2022-10-14

Mithril session

  • Following the release of the experimental certified signers mode, we can now see some green badges next to the verified PoolIds in the certificates of the Explorer 🎉

  • We have discussed about the issue Get/Show current version on Mithril nodes cli / APIs#541:

    • The node will display its version when launched
    • We will add a version command on the CLIs that will output the running version
    • We will add headers with versions in the Signer and Client requests as well as in the Aggregator responses
  • We have talked and paired on the Adapt CI workflows to work with the new release process#543. We have made a PoC of the pre-release pipeline in order to test that:

    • We catch the correct triggers ✔️
    • We can retrieve artifacts produced in a previous/different workflow run ✔️
    • We can produce GitHub releases from the workflows run ✔️
  • During our discussions, we have talked about how to handle:

    • Adding new information that are part of the signed message (as Signers will probably not all upgrade at the same time). In that case will it be possible to produce signatures in that conditions ?
    • A solution could be to type certificates depending on what information is signed and to chain only the ones that embed the next stake distribution
    • We will probably have the same issue when we upgrade protocol versions that are not backward compatible

2022-10-13

Mithril session

  • We had talks and paired on the issue Implement Release process#500:
    • The process of artifacts promotion required some clarifications:
      • Each commit triggers a first CI workflow that builds artifacts and deploys to testing environment
      • Each git tag triggers a second Pre-Release workflow that promotes artifacts to pre-release environment and also creates the associated GitHub release (same name, in pre-release status)
      • When the release candidate is validated, the pre release status is removed from the GitHub release and that promotes artifacts to release environment
    • We have tried to define a process regarding when/how to update the versions of the crates:
      • One version for the workspace and one different version for mithril-core
      • Just after releasing a version 0.1.2:
        • We commit a new 0.1.3-dev version until we are happy with a release candidate (tagged 0.1.3-rcX)
        • When we are ready to release a candidate, we update the version to 0.1.3 and we tag it as 0.1.3 (and re test it)
        • We then release version 0.1.3 and we start all over again this process

2022-10-12

Mithril session

  • Following the activation of the experimental Signer certified registration, SPOs have reported troubles with their nodes:

    • Issue Unhelpful Log message#546: The error messages provided were not helpful to the users. We paired on improving them by a giving a detailed feedback on the bad request status code from the Aggregator
    • Issue Signer registration fails with key certification mode#548: Signer trying to register by using the certification mode fails because the KES Signature can't be verified. This is still under investigation as the underlying cryptography is complex. In the mean time, we have paired and merged a temporary fix that tries all the possible KES Period values: the Signer are now registered. We expect them to be able to sign the snapshots in 2 epochs (rebuild of their nodes is however mandatory)
  • We have noticed some warning messages in the CI jobs and have created the issue Update workflow github actions#550. A first PR will be merged shortly to update a first part of the GitHub actions. We will keep an eye on the other actions to be updated as soon as updates are released

  • We have also merged the PR remove SQL migration tool #540 which decommission the data stores migration tools of the Aggregator and the Signer

2022-10-11

Mithril session

2022-10-10

Team Session

  • Key certification We need to compute the KES range -> need to pass the KES period
    • compute range for KES period from genesis parameters
  • Pb: we don't know what's really useful on mainnet registration process requires each signer to know the key of every other signer
  • write one or more CIPs for "Mithril Decentralisation"?
    • Mithril networking CIP
    • key registration process
  • What about multi-pool runners? -> no need to take care of
  • Signer deployment -> which deployment model? Mithril Deployment Model CIP? RFC?
  • TODO:
    1. draft something for each "CIP" -> 2 pager (A3) respecting somewhat the structure of a CIP
    2. check with CIP process "guardians" whether or not they would fit -> Michael, Matthias a. if OK -> write the full thing b. if NOK -> turn into a GH discussion -> invite people from Community + IOG to react/comment/propose
  • Use A3 format ?
  • Edinburgh:
    • JP -> presentation 15'
    • Iñigo -> support + Q/A
    • Arnaud -> test demo w/ daedalus
    • Deadline EOW Slide deck for the talk
    • Reza + Arnaud -> presentation 1-slide for CH keynote + Slide deck

Mithril session

  • We had discussions about the rename attribute in serde annotations. It was used in almost all of the entities fields even though they had the same as the JSON version. The redundant annotations have been removed in the PR Enhance serde annotations #538 that has been merged

  • We have reviewed some build time analysis that have been produced in order to understand the bottleneck of the build step of the CI. We didn't find any interesting information and we think that it is maybe due some cache loading issue within the CI. We will continue our investigations

  • We have reviewed the modifications done to the PR New STM registration procedure #433. We are pairing on final modifications before we can merge it tomorrow

  • Also, the PR use command and parameters for the client #536 has been reviewed and merged

2022-10-07

Mithril session

  • We have reviewed the last version of the issue Fix CLI args precedence in Client/Signer/Aggregator #511. Everything is done now, except the digest argument that could probably be passed without being named (as this is the case until now). This will avoid current users of the client to break their implementation. The PR should be merged early next week

  • Many comments have been received on the PR New STM registration procedure #433. They are currently being treated and the review should be merged early next week

  • Here is the schema presented yesterday during the demo that illustrates the certification process of the Mithril verification keys: Mithril (1)

2022-10-06

Mithril session

  • We have discussed about the peer review that we have made yesterday on the PR New STM registration procedure #433. All the points noted during the session have been fixed. The documentation part will be added today, stating that ths feature is experimental. We will prepare a Dev Blog post in a separate review that will explain the next steps:

    • Testing of the new feature with volunteer SPOs for a transient period that allows both Certified and Non Certified SPOs for smooth transitioning
    • Improvement of the design so that it fits well with the SPO Cardano nodes architecture (Core/Relay/Firewall/Keys security)
    • Progressive deprecation of the Non Certified mode
  • We have talked about the issue Fix CLI args precedence in Client/Signer/Aggregator #511 that will be merged very shortly once some issues with the test lab are fixed

  • We have also worked on the demo path of this iteration:

    1. Introduction
    2. Showcase of the Mithril Keys Certification on the devnet
    3. Presentation of the Evolution of arguments handling in the CLI of the nodes
    4. Conclusion
    5. Q&A
  • Showcase path of the Mithril Keys Certification:

# Mithril Keys Certification
# On devnet, with evolving undeterministic verification keys, with evolving stake distribution, with full Certificate chain, with keys certification

---

# Setup demo

## Download source (if needed)
git clone https://github.com/input-output-hk/mithril.git
#or
git clone [email protected]:input-output-hk/mithril.git

## Checkout correct branch
cd mithril/
git switch mock_certification
cd mithril-client && make build && cp mithril-client ../../ && cd ..
cd ..

---

# Demo: Run devnet

## Start explorer
cd mithril/mithril-explorer
make dev &
cd ../..
google-chrome http://localhost:3000/explorer

## Change directory
cd mithril/mithril-test-lab/mithril-devnet

## Query Cardano
watch -n 1 NODES=cardano ./devnet-query.sh

## Logs Mithril
watch -n 1 NODES=mithril LINES=100 ./devnet-log.sh

## Start devnet with 5 pools
./devnet-stop.sh && NUM_POOL_NODES=5 DELEGATE_PERIOD=100 EPOCH_LENGTH=60 ./devnet-run.sh

---
# Demo: Restore a snapshot from devnet

## Prepare vars
NETWORK=devnet
AGGREGATOR_ENDPOINT=http://localhost:8080/aggregator
GENESIS_VERIFICATION_KEY=5b33322c3235332c3138362c3230312c3137372c31312c3131372c3133352c3138372c3136372c3138312c3138382c32322c35392c3230362c3130352c3233312c3135302c3231352c33302c37382c3231322c37362c31362c3235322c3138302c37322c3133342c3133372c3234372c3136312c36385d
LATEST_DIGEST=$(curl -s ${AGGREGATOR_ENDPOINT}/snapshots | jq -r '.[0].digest')
echo $LATEST_DIGEST

## List snasphots
NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT GENESIS_VERIFICATION_KEY=$GENESIS_VERIFICATION_KEY ./mithril-client list

## Show snasphot details
NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT GENESIS_VERIFICATION_KEY=$GENESIS_VERIFICATION_KEY ./mithril-client show $LATEST_DIGEST

## Download snasphot
NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT GENESIS_VERIFICATION_KEY=$GENESIS_VERIFICATION_KEY ./mithril-client download $LATEST_DIGEST

## Restore snasphot
NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT GENESIS_VERIFICATION_KEY=$GENESIS_VERIFICATION_KEY ./mithril-client restore $LATEST_DIGEST
  • We have also talked about:

    • The possibility to use the Mithril Signer as a process that would not be running as daemon. It could be launched by a cron or the Cardano node itself at regular intervals
    • With that perspective, we could piggyback on the Cardano node which would be used to broadcast (Tx/Rx) the messages and store them in a bus. The Mithril Signer would use this bus whenever it is launched
  • During the demo some interesting points were addressed:

    • The Mithril Relay design seems to be preferred by the SPOs as it would provide more security (the Cardano Relay is very likely to be subject to attack attempts)
    • We need to understand the impact of using Operational Certificates for Multiple Pools and see if this is a concern (as each server would have its own Operational Certificate)

2022-10-05

Mithril session

  • We have paired on resolving the issue that we discovered regarding KES Period usage in Implement Certification of the Mithril Verification Keys in Signer/Aggregator #455:

    • Simple (but not complete) solution implemented: store the KES Period along with the SignerWithStake in the aggregator store and send it back to the Signers for them to make a valid key registration (even if the KES Period has expired when used)
    • Simple solution (next steps): enforce the range of valid KES Periods valid for an Epoch (which should be easily computable given the current Slot and the genesis parameters slotsPerKESPeriod and maxKESEvolution (could be added to the Aggregator Beacon in the pending certificate or computed from the Epoch number directly on the Signer node)
    • More difficult solution: build the KeyReg at the same time on the Signers and Aggregator and store the CloseReg (and use it on the Signer when the time has come). This would require a broadcatst/gossip mechanism between the nodes
  • We also had some discussions on the design of the Signer (with Key Certification) given the topology of the Cardano nodes run by the SPOs:

    • We could use the Core node to process the signature w/ KES secret key given a message through the Cardano CLI
    • Or maybe use the Relay node to act as a proxy to make this operation
    • Some other discussions will take place to find the best architecture in the next weeks
  • We have made a thorough peer review (with the whole team) of the PR New STM registration procedure #433 that should be merged before the end of the week 💪

Mithril Research/Engineering sync

  • We discussed about the security that should be applied to Mithril Secret Keys versus the Cardano Secret Keys:
    • The best option is to delete the keys as soon as the associated Certificate is produced
    • We must keep in mind that in case of an epoch gap in the Certificate Chain, we may need the keys for 1 more epoch
    • The storage of the keys is also a concern (maybe they should be on the Core node)

2022-10-04

On-boarding Reza on Mithril

  • Presentation of the team
  • Discovery of the GitHub repository (Project, Wiki, ...)
  • Discovery of the documentation website and of the Mithril Explorer
  • Q&A session
  • Plan next sessions in the following days/weeks

Mithril session

2022-10-03

Mithril session

  • We have reviewed the issue Fix CLI args precedence in Client/Signer/Aggregato #511:

  • We have paired on implementing a Verified Signer on the Mithril Explorer for the Signers that have registered their SPO with the certification process as in Implement Certification of the Mithril Verification Keys in Signer/Aggregator #455. Also the devnet Docker images were not working since the merge of the PR fix SQLite deadlocks #521

  • We have talked about the process of Mithril Keys Certification which was challenged on the Discord channel:

    • The Operational Certificate does not need to be available on the Cardano chain (which means that any pool that has not produced blocks yet can register on a Mithril network)
    • The validation mechanism works this way:
      • The Mithril Signer Verification Key is signed by the KES Secret Key of the SPO
      • This signature is verified with the KES Verification Key stored in the Operational Certificate
      • The Operational Certificate is signed by the Cold Secret Key
      • This signature is verified with the Cold Verification Key stored in the Operational Certificate
      • The PoolId is computed as the hash of the Cold Verification Key stored in the Operational Certificate
      • This ensures that only the holder of the SPO Cold Secret Key is able to register its PoolId and Mithril Signer Verification Key on a Mithril network
    • We will open a GitHub discussion regarding this subject and we will as well create clear documentation for this feature
  • Following our work from last week, we have continued working on the setup of the new Release Process, as in issue Implement Release process #500:

    • ⚠️ The reset of the preview and preprod networks that will occur in a near future will require a new Genesis Certificate for the current testing environment
    • We agreed that the SPO that we host on the testing and pre-release environments will be in a naive setup at first (only one Core Cardano node, a Relay Cardano node will be added in a second time). We won't apply heavy security requirements on the keys (cold/air-gap) and we will keep things simple and maintainable with automation
    • Once a commit artifacts are deployed on the testing-preview and/or pre-release-preview environments, we will launch automated Smoke Tests (to be defined) that will validate the conformity of the development (by testing the available routes and their responses, and that snapshots/certificates are produced after a deployment)
    • A pre-release deployment will be tested on a 24-48 hours depending if it is a minor or patch update before being qualified as releasable
    • Some selected SPOs will be running some Signer nodes on the pre-release-testing environment and will provide with some feedback before release
    • In case of critical bug fix, the qualification phase will be drastically shortened and the main indicator that will be used will be MTTR (Main Time To Repair)
    • We still have to find solutions on how to manage the window release length vs the merge locks that it could create
    • We will have to refine our vision of how to manage failing deployments with dedicated process/checklists
    • We will try to release a new version every 2 weeks, even if it only embeds crates update and small fixes
    • We have decided to implement a lightweight Monitoring / Alerting / Status Page solution: uptime robot that will help us monitor closely failing deployments and provide status feedback

September 2022

2022-09-30

Mithril session

2022-09-29

Mithril session

2022-09-28

Mithril session

  • We have talked about the issue Fix CLI args precedence in Client/Signer/Aggregator #511:

    • The problem is linked to the default value of the arguments passed by clap that is always used (even though an overriding value has been passed by an environment var or via a configuration file)
    • Some of the arguments used to setup the nodes are thus working only if we use the clap arguments which is not very convenient/coherent (as the vast majority of the others are set with environment vars)
    • The best solution is to not use default values for configuration that can be overridden (all except run_mode and verbosity_level)
    • We will formalize this rule in a dedicated ADR
  • We have also made a deep review of the PR New STM registration procedure #433 that is linked to issue Implement Certification of the Mithril Verification Keys in Signer/Aggregator #455:

    • The development of the first phase are close to getting ready and we hope to merge it soon
    • It will not include breaking changes as the Signer and Aggregator will be able to work on hybrid modes:
      • Declarative mode with a non certified PoolId (as already running)
      • Certified mode with a certified PoolId (activated only when a Signer is associated to an Operational Certificate and a KES Secret Key
    • A second phase will involve the development of a dedicated Mithril Certifier that will help handling KES Secret Key that will not be stored on the same Cardano node (Core) as the Mithril Signer which will be running on top of the Relay Cardano node

2022-09-27

Mithril session

  • We had discussions about the issue Move stores to relational design with SQLite#476 for which we will probably proceed in multiple phases (Signer + Aggregator):

    • Use a relational data model that will be used to implement the current Store traits
    • Implement a data model upgrade a la sqitch
    • Refactor(if needed) the several Store traits used to access these datas
    • Create ways of aggregating the relational data (with new routes to access them). We will need to dedicate a session for this
  • We talked about the next steps following the setup of our first SPO on preview:

    • Automate with scripts to deploy easily with terraform in the different environments
    • Handle pool metadata hosting on the documentation website
    • Implement the Core/Relay nodes topology
    • Work on automating the rotation of the keys
    • We will dedicate a session to these next steps
  • We also discussed about the progress of the issue Implement Certification of the Mithril Verification Keys in Signer/Aggregator#455 which is close to getting ready:

    • Test adaptation to do (vs hybrid mode of the Signer/Aggregator certification for a smooth transition with SPOs)
    • Updating documentation to reflect the changes
    • Write a blog post to explain the Certification activation road map (with Mithril Certifier to come)

2022-09-26

Mithril session

2022-09-22

Mithril session

  • We have paired on the issue Fix database dead locks in Aggregator#517. The solution that we have implemented is the following:

    • Add a minimum version of SQLite: 3.35+ so that we can use DELETE...RETURNING statements that avoid explicit use of transactions
    • Update the CI so that it embeds this minimum version of SQLite
    • Add a retry mechanism to fetching data (simple but efficient with fixed sleep duration and max retry limit)
    • We have merged the PR fix SQLite deadlocks #521
    • We will keep watching if the database locks keep occurring on GCP and on the CI
  • We had talks about evolution of the stores that will be required by the issue Implement Certification of the Mithril Verification Keys in Signer/Aggregator#455:

    • We will probably prepare a manual update script (as only the Aggregator is concerned with this upgrade)
    • We definitely need to work with a relational data model soon to handle smoothly this type of upgrade (that could also occur on the Signer)
    • This will be addressed in issue Move stores to relational design with SQLite#476
  • We have also prepared a demo path for the first demo with the members of the Mithril Pioneer Program:

    1. Introduction
    2. Showcase of the Genesis Certificate on the devnet
    3. Presentation of the milestone of 10 SPOs signing on our preview network
    4. Presentation of the Dev Blog
    5. Showcase of the SQLite migration
    6. Presentation of the Store Retention feature
    7. Presentation of the upcoming Release Process
    8. Conclusion
    9. Q&A
  • Here is the showcase path for the Genesis Certificate on the devnet:

# Mithril Genesis Certificate
# On devnet, with evolving undeterministic verification keys, with evolving stake distribution, with full Certificate chain, without keys certification

# Resources

## Github
google-chrome https://github.com/input-output-hk/mithril

## Website
google-chrome https://mithril.network/doc

## Explorer
google-chrome https://mithril.network/explorer/

---

# Setup demo

## Download source (if needed)
git clone https://github.com/input-output-hk/mithril.git
#or
git clone [email protected]:input-output-hk/mithril.git

## Checkout correct commit
cd mithril/
git checkout b7069fd6281f21052f90b80d149f743471c63bbe
cd mithril-client && make build && cp mithril-client ../../ && cd ..
cd ..

---

# Demo: Run devnet

## Start explorer
cd mithril/mithril-explorer
make dev &
cd ../..
google-chrome http://localhost:3000/explorer

## Change directory
cd mithril/mithril-test-lab/mithril-devnet

## Query Cardano
watch -n 1 NODES=cardano ./devnet-query.sh

## Logs Mithril
watch -n 1 NODES=mithril LINES=100 ./devnet-log.sh

## Start devnet
./devnet-stop.sh && DELEGATE_PERIOD=100 EPOCH_LENGTH=60 ./devnet-run.sh

---
# Demo: Restore a snapshot from devnet

## Prepare vars
NETWORK=devnet
AGGREGATOR_ENDPOINT=http://localhost:8080/aggregator
GENESIS_VERIFICATION_KEY=5b33322c3235332c3138362c3230312c3137372c31312c3131372c3133352c3138372c3136372c3138312c3138382c32322c35392c3230362c3130352c3233312c3135302c3231352c33302c37382c3231322c37362c31362c3235322c3138302c37322c3133342c3133372c3234372c3136312c36385d
LATEST_DIGEST=$(curl -s ${AGGREGATOR_ENDPOINT}/snapshots | jq -r '.[0].digest')
echo $LATEST_DIGEST

## List snasphots
NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT GENESIS_VERIFICATION_KEY=$GENESIS_VERIFICATION_KEY ./mithril-client list

## Show snasphot details
NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT GENESIS_VERIFICATION_KEY=$GENESIS_VERIFICATION_KEY ./mithril-client show $LATEST_DIGEST

## Download snasphot
NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT GENESIS_VERIFICATION_KEY=$GENESIS_VERIFICATION_KEY ./mithril-client download $LATEST_DIGEST

## Restore snasphot
NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT GENESIS_VERIFICATION_KEY=$GENESIS_VERIFICATION_KEY ./mithril-client restore $LATEST_DIGEST

2022-09-21

Mithril session

  • We have paired on the issue Fix database dead locks in Aggregator#517. After investigation, it appears that although we have implemented the store adapters behind RwLock, in some situation a database lock is possible:
    • If a transaction is opened by an adapter, the whole database is locked. Thus an attempt to make a query will result in a Error 5: database is locked, until the transaction is committed or rollback
    • A first issue is that in the case an error occurred during the transaction, it was never closed and resulted in a permanent lock of the database (until the service was restarted)
    • We are working on some improvements that will make the system more resilient and efficient (although it requires some modifications on the CI to make sure the version of sqlite is at east 3.35)
    • We will continue working on a this issue tomorrow as it also creates some flakiness in the CI test lab runs

Mithril Research/Engineering sync

  • We had discussions about:
    • Signer Registration (see discussion How should we link the Mithril identity with Cardano identity #508): it can be trusted because of the Genesis Certificate, so there is no specific problem with it
    • Stake Distribution (new discussion to be setup to share these information with the community): understanding the portion of the stakes that is required to be secure, and how to possibly ramp up Mithril on the mainnet in multiple phases (with the implication of IOG stakes at first until we reach the required portion of all Cardano stakes)
    • Batch verification of the Certificates multi-signatures which would be provided by a batch verification function in the core library. This would involve a slightly different way of validating the Certificate Chain to take advantage of this feature

2022-09-20

Mithril session

2022-09-19

Mithril session

Mithril

  • A preferred design (that should be more adapted to the SPOs) is a Async Validator version:
    1. Signer creates key material to sign when crossing epoch threshold (the protocol initializer with its associated verification key)
    2. Validator calls signer when "ready" (on cron, or manually) and ask for key material to sign
    3. Validator uses hot KES keys to sign the key material and send it to the signer
    4. Signer can then start registration process once it has signed material

Mithril (1)

  • In the Mithril Explorer we will display the security level (or probability of an adversarial party to create a fake certificate) on each snapshot (and provide the formula used to compute it when hovering the protocol parameters displayed)

2022-09-16

Mithril session

  • We had talks about the issue Add auto pruning in stores#504:

    • It appears that there was a bug in the MemoryAdapter were the get_last_n_records function retrieved the n last records sort by date of update instead of date of creation. This bug was fixed.
    • However, there was a bug in the implementation and in its test. We have discussed about how we could create some trait related tests that could help us spot such a problem easily (and also help qualify a new implementation of the traits is "correct")
    • We have also talked about how to handle the configuration of the retention length on the stores: if none is specified (as this is currently the case) full retention is applied, if a retention length is specified then this length is used to prune the stores
  • We had some discussions about the discussion Use CIP-22 as a way to identify SPOs when registering keys #507:

    • The idea behind is the same as the one under implementation in the PR New STM registration procedure #433:
      • Asking the owner of the pool to sign a message with its secret key in order to prove it owns this secret key
      • In CIP-22:
        • The message signed has no meaning and is randomly generated by the verifier of the ownership
        • The secret key used is the VRF Secret Key which is a hot non rotated key (but for which there is no Rust library available for signing/verifying)
      • In our proposal:
        • The message signed is the actual Mithril Signer Verification Keyvalid for 1 epoch
        • The secret key used is the KES Secret Key which is a hot rotated key (for which a Rust library is available, done by IOG at https://github.com/input-output-hk/kes)
    • The architecture of a Cardano SPO on the mainnet implies that:
      • A Core Server hosts a Core (or Block Producing) Cardano node, which is a Full node that has access to SPO hot secret keys and is isolated from the rest of the world (except that it is allowed to communicate with one or multiple associated Relay nodes)
      • A Relay Server hosts a Relay Cardano node, which is a Full node which is accessible from other external Cardano node peers, but does not have access to the SPO secret keys
    • A naive setup for running a Certified Mithril Signers (devnet or preview) requires that the Mithril Signer node has access to:
      • An Aggregator that is external to the SPO infrastructure via a REST API (to send individual signatures)
      • The database of a local Cardano Full node via file system (to compute snapshot digests and stake distribution)
      • The SPO hot secret keys (and operational certificates) via file system (to compute the signature that certifies the SPO is genuine)
    • A more elaborated setup (preprod or mainnet) would probably require that we split the Mithril Signer in 2 parts:
      • A first part running on the Core server only responsible for signing the Mithril Signer Verification Keys (when requested by the other part)
      • A second part running on the Relay server and responsible for the rest of the Mithril protocol (registering with Aggregator, sending individual signatures, ...)
    • Here is a sketch of the naive setup: Mithril
    • And a sketch of the real setup: Mithril (1)

2022-09-15

Mithril session

  • As expected, 2 epochs after applying the fix on the Stake Distribution computation of issue Stake distribution discrepancy #497, the Signers have been able to produce reliably individual signatures that are successfully registered on the Aggregator 💪

  • We have followed up on the merge of the issue Deploy SQLite store adapter #475. We have made some fixes on the migrators. We have helped the SPOs who had hard times migrating some of their stores and everything looks good now 🎉

  • We have talked about a nice to have feature of pruning automatically the stores of the Signer/Aggregator nodes. This will be implemented shortly in this issue Add auto pruning in stores #504

  • Also we have paired on the issue Implement Certification of the Mithril Verification Keys in Signer/Aggregator #455. We are working on a plan to deploy smoothly the feature to the SPOs before activating it on the Aggregator, so that a transition window will be opened for SPOs to deploy the change on their Signer nodes. We will keep on pairing on this complex topic during this iteration

2022-09-14

Mithril session

  • Following the merge of the issue Stake distribution discrepancy #497, the stakes stores on GCP (Aggregator and Signers) are OK. We keep an eye on the list of signers in the Certificates from epoch 37 that should embed new Signers and the error rate on the individual signatures registration that should drop

  • We have paired and merged the issue Deploy SQLite store adapter #475 that activates the new SQLite data store:

    • The Aggregator and the Signers nodes running on GCP have been successfully migrated to use the new store adapter
    • We encountered a few difficulties when migrating the Aggregator stores. It appears that being able to qualify the migration on a testing environment would have been very helpful
    • We are expecting the SPOs to migrate their stores (as explained in this dev blog post)
  • We have have continued working on the Release Process setup:

    • A dedicated issue has been created Implement Release process #500 and some tasks have been added to it
    • Here is the updated definition of the process:
      • We will use a common version (semver) for all the crates of the repository and for the GitHub release
      • All the nodes should be able to display the current version they are running
      • In case of a version mismatch, the Aggregator should return an error so that the Signer/Client nodes are updated regularly
      • We will work with GitHub environments to support deployments of versions on multiple environments
      • A new version 0.1.2 will have the following life cycle:
        • A commit abc123 merged on main branch is deployed on testing environment named testing-preview
        • A commit def456 tagged with 0.1.2-prerelease1 is deployed on preprod environment named pre-release-preview
        • A GitHub release 0.1.2 is created and linked with the 0.1.2-rc1 tag and marked as pre-release
        • A tag 0.1.2-prerelease1 is qualified and selected for release or rejected (and replaced by a 0.1.2-prerelease2 tag if necessary on a ghj789)
        • If the tag 0.1.2-prerelease1 is selected, a new tag is created and name 0.1.2 on the same commit def456
        • The GitHub release is linked to the 0.1.2 tag and marked as release
        • The commit def456 with tag 0.1.2 is deployed to the prod environment named release-preprod
    • Some questions remain:
      • When to update cargo.toml crates version vs creation of the draft release on GitHub?
      • How to handle merge lock during qualification of a release candidate (with only main branch) (Use of feature flag?)
      • How to handle Protocol Versions smoothly (backward compatibility of messages w/ Avro or equivalent solution?)
      • How to simplify the update process for the SPOs (with debian package for example)?
      • How to handle real SPOs on the testing-preview and pre-release-preprod environments (vs key rotations, secret keys management, ...)?
    • The deployment schema is now: Image

2022-09-13

Mithril session

  • We have reviewed and merged the issue Stake distribution discrepancy #497:

    • The Stake Distribution should get back to normal 2 epochs after rebuilding the Signer
    • We will keep monitoring the GCP hosted Aggregator to check that the deployment goes well and does not prevent the Snapshot production.
    • The SPOs should rebuild their Signer node (as explained in this dev blog post)
  • We have paired on the issue Deploy SQLite store adapter #475 and finalized the steps to follow in order to smoothly migrate the Signer/Aggregator nodes stores. The Use Sqlite datastore in Aggregator & Signer #477 should be merged tomorrow

  • We have also worked on defining the Release Process for the Mithril Network:

    • We will use a common version (semver) for all the crates of the repository and for the GitHub release
    • All the nodes should be able to display the current version they are running
    • In case of a version mismatch, the Aggregator should return an error so that the Signer/Client nodes are updated regularly
    • We will work with GitHub environments to support deployments of versions on multiple environments
    • A new version 0.1.2 will have the following life cycle:
      • We will use a common version (semver) for all the crates of the repository and for the GitHub release
    • All the nodes should be able to display the current version they are running
    • In case of a version mismatch, the Aggregator should return an error so that the Signer/Client nodes are updated regularly
    • We will work with GitHub environments to support deployments of versions on multiple environments
    • A new version 0.1.2 will have the following life cycle:
      • A commit abc123 merged on main branch is deployed on testing environment named testing-preview
      • A commit def456 tagged with 0.1.2-prerelease1 is deployed on preprod environment named pre-release-preview
      • A GitHub release 0.1.2 is created and linked with the 0.1.2-rc1 tag and marked as pre-release
      • A tag 0.1.2-prerelease1 is qualified and selected for release or rejected (and replaced by a 0.1.2-prerelease2 tag if necessary on a ghj789)
      • If the tag 0.1.2-prerelease1 is selected, a new tag is created and name 0.1.2 on the same commit def456
      • The GitHub release is linked to the 0.1.2 tag and marked as release
      • The commit def456 with tag 0.1.2 is deployed to the prod environment named release-preprod
      • Diagram of the release process is below: Mithril

2022-09-12

Mithril session

  • We have talked about the nearly ready to merge issue Deploy SQLite store adapter #475:

    • How long do we keep the migration binaries available before decommissioning them? (From 2 to 4 weeks)
    • How to communicate with the SPOs about that breaking change and provide them with simple yet efficient documentation (This will be implemented inside a dedicated dev blog post)
  • We have reviewed and merged the Record 'contributing' Signers only in Certificate #495

  • We had discussions about the issue Stake distribution discrepancy #497 that makes the Stake Distribution computation non deterministic and source of A provided signature is invalid error messages when a Signer submits individual signatures. In order to fix swiftly the problem, we have defined a plan:

    • Solution 1: Add a feature that makes the Stake Store retrieve always the same Stake Values until a better solution is found (worst case scenario; this will not be necessary, as we moved to Solution 2 directly)
    • Solution 2: Compute the Stake Distribution differently by gathering the Stakes from the previous epoch pool by pool (best solution for testnet; solution that is under development in the PR Fix Stake Distribution retrieval #499)
    • Solution 3: Modify the cardano-cli so that it computes the stake distribution at the previous epoch (better solution for long term and mainnet; we will explore it in the future)
    • Solution 4: Package a custom developped cli in Haskell that will query the ledger state and retrieve theStake Distribution of the correct epoch (good solution, but drawback is that we need to package/deliver several binaries at once)
  • Other solutions have been debated such as calling Haskell functions from Rust or using a third party chain indexer

  • We have postponed the talks about the release process and we will resume them tomorrow during a dedicated session.

  • We have agreed that a relevant test case of Daedalus/Mithril would be to boostrap a mainnet archive with/without Mithril snapshot. This will require that we run a mainnet "test" environment. This will be part of our release/environments concerns/discussions

  • Also, as we have been using the Cardano infrastructure (node/cli) quite a lot during our developments, we will organize a retrospective to give some feedback about it

2022-09-08

Mithril session

  • The Genesis Certificate deployement worked as expected and new Snapshots are now available on the Mithril Explorer 🎉

  • We have reviewed and paired on the PR Use Sqlite datastore in Aggregator & Signer #477 of the issue Deploy SQLite store adapter #475 with a main focus on the migration tool that is being built in order to migrate existing JSON stores to SQLite. We are at the stage of making the tool as easy to use as possible for the SPOs that will use it. Also we will create a How to migrate stores guide and a post on the dev blog that explains why and how use this tool. We should be able to merge next week

  • We have also reviewed, paired and merged many fixes and improvements PR:

    • Align signer & aggregator state machines logs #486
    • Update Genesis GCP infra #487
    • Fix auto create 'stores' directory in Signer #488
    • Add Pull Request Template #489
    • Upgrade dependencies & enhance makefiles #490

2022-09-07

Mithril session

  • The PR Implement Real Genesis Certificate #438 has been merged and deployed successfully on the GCP Aggregator. However we had hard times to run the genesis bootstrap command. A fix is available in this PR Update Genesis GCP infra #487. The first Genesis Certificate has been generated and saved successfully at epoch 29 of the preview network and we should see new Certificates produced as soon as the transition to epoch 30 has taken effect 🎉

  • We have also worked on the preparation of the migration from JSON to SQLite stores (which must take place on the Aggregator as well as the Signers), and have identified few options:

    • Add a specific command line in Aggregator/Signer to handle the migration
    • Handle the migration with dedicated scripts, which would be cumbersome and does not look like the best option
    • Add a new binary build in the cargo projects of Aggregator/Signer (that looks like the best option to take advantage of the CI and drop the code within a short time frame after release)
  • Once we have migrated to SQLite our stores, we will move on the relational implementation of the stores. We will have to work on an upgrade mechanism that will automatically upgrade the database schema when required

  • We had discussions about the Signers displayed in the Certificates of the Explorer:

    • We could display the stakes as ADA value or as %age of total stakes enrolled in the Mithril network
    • We could also display which Signers have their individual signatures included in the certificate
  • We have added a new Dev Blog on the documentation website. This will help handle communications with the SPOs regarding breaking changes, deprecated features, new versions release, ...

  • We need to work on the release process in order to manage correctly the evolution of the network with SPO users. We have talked about the options and questions we have, and will address them in a dedicated session:

    • Rhythm of releases
    • Versioning of the crates vs the Github tags
    • Validation of the release candidates
    • Trunk based or Gitflow?
    • Packaging of the releases
    • Automatic updates?

Mithril Research/Engineering sync

  • We have talked about the possible implementations of the optimization described in issue Extend API to accept signature generation without Merkle path #161 and in PR Switch blst #159

  • We also talked about the way we could create more compact certificates by avoiding duplication of the common parts of the Merkle paths stored

  • We have discussed the way we could provide a Security Level of the Chain on the Mithril Explorer, which relates to issue Include probability of success for different parameters #48. Researchers will provide a formula based on k, m, and phi_f protocol parameters that can be used to compute a probability that an adversarial party produces a valid multi-signature

  • We discussed about the evolution of the protocol parameters and Researchers will come back with proposed set of parameters that fits the number of Signers involved in the network

  • Finally, we talked about the RFP regarding the understanding of the impact of the percentage of the stakes involved in the network vs the security level, as it appears that the paper assumption of 100% stakes involved is not realistic. Also some very different scenarios can occur when we think about only a share of the stakes involved: if 10% of stakes are involved in Mithril network and 10% of the stakes of the Cardano network are considered adversarial, do we consider that 100% (all the adversaries of Cardano) or 10% (the share of adversaries of Cardano) of the Mithril stakes are adversarial?

2022-09-06

Mithril session

  • We have been reviewing and finalizing the PR Implement Real Genesis Certificate #438. It is ready and will be merged tomorrow. Here are the operational implications:

    • Reset the Certificate Chain of the GCP hosted Aggregator
    • Bootstrap the Genesis Certificate on the GCP Aggregator
    • Requires that the SPOs recompile their Signer node (to handle faster registration), but previous version is compatible and will continue working
  • Regarding the flakiness of the CI:

    • We attribute it to the way the Stake Distribution computed by the cardano-cli
    • The expected error rate on the CI is ~4%. If this rate gets too high, we will have to deactivate the stake delegation feature of the test lab until we find a better solution
  • We have also worked on the migration of the stores of the Aggregator/Signer to SQLite as in Deploy SQLite store adapter #475. We still have a few issues to fix and we will also work on an automatic upgrade mechanism (especially on the Signer side) before merging

2022-09-05

Mithril session

  • We have merged the issue Deploy mithril demo infra on 'preview' network #457 (as well as the PR Update Blake dependency #474). The Aggregator hosted on GCP is now running on the preview network and producing snapshots 💪

  • We have debriefed about the previous session and the Certification of the Mithril Signer Verification Keys and we all agreed on the next steps discussed previously

  • We have spent some time to dig in the Haskell code that makes the calculation of the stake distribution and we have found out that the cardano-cli provides the full precision on the stake distribution when the --out-file option is activated. An issue has been created to adapt the current implementation of the Chain Observer and take advantage of this option Enhance Stake Distribution retrieval #480

  • ⚠️ We have also tried to understand the source of flakiness on the CI and we have noticed that the computation of the stake distribution may be responsible:

    • We have noticed that even though we plugged all the Mithril nodes of the test lab on the same Cardano node of the devnet, the nodes retrieved different stake distributions during the same epoch
    • We have leaded another experimentation with stake delegation and we have clearly found that we could actually have different results during the same epoch
    • This is a problem as we are expecting:
      • The Stake Distribution to be computed for the previous epoch (and not the current epoch)
      • The Stake Distribution to be deterministically computed on all the nodes
    • We will probably have to work on different implementations of the Chain Observer:
      • Propose an evolution of the cardano-cli that allows to target a specific epoch when computing the Stake Distribution
      • Investigate other technologies that allow to observe the evolution of the chain

2022-09-02

Mithril session

  • We have talked about the incoming PR that include breaking changes:

    • Move GCP Aggregator to 'preview' network #470
    • Update Blake dependency [#474] (https://github.com/input-output-hk/mithril/pull/474)
    • Use Sqlite datastore in Aggregator & Signer #477
    • Implement Real Genesis Certificate #438
    • We will, at least, merge #470 and #474 at the same time: (Scheduled for Next Monday)
      • Requires that the SPOs recompile their Signer node, update the configuration (NETWORK=preview and NETWORK_MAGIC=2)
      • Involves a full reset of the Aggregator on GCP, and a manual intervention to produce new certificates
    • If possible, we will also merge #477:
      • Requires that the SPOs recompile their Signer node
      • Involves a full reset of the Aggregator on GCP, and a manual intervention to produce new certificates
    • When ready, we merge #438:
      • Transparent for SPOs
      • Requires a reset of the Snapshots and Certificate Chain (which will be bootstrapped with a Genesis Certificate) on the Aggregator
  • We have paired on the last bug that creates flakiness in the CI in the Bootstrap Certificate Chain w/ Genesis Certificate #364. It appears that a discrepancy occurs from time to time (~5%) on the computation of the Next Aggregate Verification Key between the Signers and the Aggregator. We are still investigating the issue and we should fix it shortly

  • We have also paired on the issue Implement Certification of the Mithril Verification Keys in Signer/Aggregator #455 in order to elaborate the best way to implement this feature. We have agreed on:

    • Implementing this feature in mithril-common in order to keep mithril-core chain agnostic
    • In order to guarantee that no Mithril node can interact with the core library without being authenticated (now and in the future):
      • The mithril-core library should be directly imported only by the mithril-common crate (we should probably enforce this rule in the CI)
      • A Cardano specific ProtocolKeyRegistration will be implemented as a wrapper around the mithril_core::KeyReg and added as a sub module of crypto_helper module
      • A Cardano specific ProtocolInitializer will be implemented as a wrapper around the mithril_core::StmInitializer and added as a sub module of crypto_helper module
      • We will extend the entities::Signer type so that it includes the Cardano specific material required for Signer certification (Operational Certificate of the SPO, Signer Verification Key Signature signed by the KES Secret Key of the SPO). This will allow the Signer Verification Key Certifier to certify that the Signer node is the genuine holder of a poolId on the Cardano network and of a Mithril Signer Verification Key
      • Another required information is the KES Period that can be retrieved from the cardano-cli and that will be retrieved through the current Chain Observer (using the field qKesCurrentKesPeriod of the command cardano-cli query kes-period-info)
      • We will add a new type dedicated to serialize/unserialize Cardano crypto material (that will also handle the cborHex conversion. This type will be able to parse a crypto file generated by the Cardano cli and convert it to bytes, and to export a json format with keys encrypted in cborHex. This type will be also used for the Genesis Certificate Verification Key.

2022-09-01

Mithril session

  • We had discussions about the fixing of the flakiness of the CI that we are trying to fix in the Bootstrap Certificate Chain w/ Genesis Certificate #364. We have paired and prepared some fixes in the Implement Real Genesis Certificate #438. Also a fix on the mithril-core has been merged in order to Avoid panics in 'StmInitializer' #472

  • We also had some talks about the migration of the Aggregator hosted on GCP to the preview network:

    • At first, we will decommission the testnet snapshotting
    • Then, it will be replaced by the preview network (target ETA is EOW)
    • In a second time, we will work on supporting multiple networks
  • In order to work efficiently with SPOs, we will need to work with regular releases:

    • We intend to create new releases every 1/2 weeks
    • We will name our deployment environments the same way as the Cardano networks (devnet, preview, preprod, mainnet)
    • When a commit is pushed on a working branch, the devnet is launched in the run-test-lab job of the CI
    • When a commit is merged on the main branch, a terraform deployment will be triggered on the preview from the CI
    • When a tag is created (maybe following a specific format), a terraform deployment will be triggered on the preprod from the CI
    • The Signer, Client and Aggregator nodes will be released synchronously with the same tag version
    • We will probably implement a feature where if a Signer or a Client requests the Aggregator with a different version, a 400 bad request will be returned
  • We also had discussions about the issue Simplify the Multi Signer in Aggregator #398 and we have tried to elaborate a road map to implement it:

    • The strategy is to make the multi signer pure and let the state machine handle the state
    • We will define a clear interface for interacting with the state
    • In a second time, we will also try to enhance the state machine of the Aggregator, then of the Signer
    • We will use an event driven state machine that gets updated given a list of (State, Event) -> ApplyTransition -> NewState by depiling queued events. We still need to find a way to handle the synchronous responses of the http server routes

August 2022

2022-08-31

Mithril session

  • We have reviewed the new issues that have been created:

    • permission denied issue in dev-net #459: we have hard times reproducing the issue. Therefore, we have asked the user to provide with more details about his setup. However, we have merged a PR that could fix the permission issue Fix attempt 'Permission Denied' in devnet #467. We are waiting for a feedback of the user to see if this patch fixes the problem
    • Provide machine-readable output for mithril-client #464: We will start working on it shortly
  • We have received and reviewed a first PR from the community DATA_STORE_DIRECTORY #465 that adds a missing configuration update on the Signer setup for a SPO

  • We also had discussions about the PR in progress:

    • Greg/444/sql store #460 has been merged as a first milestone of the PoC we are conductig on switching the stores to SQLite 💪. We will work on the enhancement of the iterator management (and avoid loading the full store in memory) and also on moving the actual stores in the Aggregator and Signer nodes in the nex future
    • Implement Real Genesis Certificate #438: we need to fix the panic that occurs sometimes on the Signers and we should be able to merge the PR then. Once the PR is merged, we will be able to bootstrap a brand new preprod GCP Aggregator as in issue Deploy mithril demo infra on 'preprod' network #457

2022-08-30

Mithril session

  • We have paired on the issue Bootstrap Certificate Chain w/ Genesis Certificate #364. All the features have been implemented in the PR Implement Real Genesis Certificate #438. However, we have some flakiness issues that we need to fix prior to merging (that must have been in the previous code and that create some panics in the Signer)

  • We have reviewed and discussed about the PoC for implementing a SQLite store adapter. A first version is close to being ready with an iterator that loads all the records in memory. Once this version is stabilized, we will work on a optimizing the iterator

2022-08-29

Mithril session

  • We have paired on the issue Bootstrap Certificate Chain w/ Genesis Certificate #364. We are close to being ready to merge the PR Implement Real Genesis Certificate #438

  • We also had discussions about:

    • Issue use SQL store #444 for which the implementation of an iterator on top of SQLite is pretty complicated. We will try a simpler implementation of the iterator at first
    • Issue Simplify the Multi Signer in Aggregator #398 for which we will dedicate a session this week

2022-08-26

Mithril session

  • We have sliced and created the tickets for the new iteration

  • We have cleaned up the stales branches of the repository

  • We have merged the PR Flaky tests #374 🥳 We now use blst as the crypto backend (with portable feature activated in the CI). We have also resetted the stores of the GCP Aggregator (as the previous keys were not compatible with blst)

  • As we will start working on the Mithril Keys Certification we had some discussions about this feature (and about cbor encodings for the keys)

  • Also, we have paired on the PR Implement Real Genesis Certificate #438, that we will merge shortly

2022-08-25

Mithril session

  • We have open sourced the repository!!! 🎉

  • We have reviewed the final version of the PR Flaky tests #374 and we have paired on optimizing the portable feature implementation

  • We also had discussions about the difficulty we face when trying to implement the SQLite store adapter. We will try a different approach by working the underlying crate used by the crate we are trying to implement

  • We have prepared a path for the demo with the goal of Open Sourcing the GitHub repository 🥇:

    • Making the GitHub repository public in live 🚀
    • Showcasing the final version of the documentation website (that we have already made public)
    • Showcasing the restoration of a tesnet Cardano Node from a Mithril Snapshot hosted on GCP (and also showcasing the Mithril Explorer)
# Mithril End ot End
# On devnet, with evolving undeterministic verification keys, with evolving stake distribution, with real Certificate chain (without genesis)

# Resources

## Github
google-chrome https://github.com/input-output-hk/mithril

## Website
google-chrome https://mithril.network/doc

## Explorer
google-chrome https://mithril.network/explorer/

---

# Setup demo

## Download source (if needed)
git clone https://github.com/input-output-hk/mithril.git
#or
git clone [email protected]:input-output-hk/mithril.git

## Checkout correct commit
cd mithril/
git checkout 2c286878d070b842cd40f63ae580456cc50c00f7
cd mithril-client && make build && cp mithril-client ../../ && cd ..
cd ..

---
# Demo: Restore a snapshot from testnet

## Prepare vars
NETWORK=testnet
AGGREGATOR_ENDPOINT=https://aggregator.api.mithril.network/aggregator
LATEST_DIGEST=$(curl -s ${AGGREGATOR_ENDPOINT}/snapshots | jq -r '.[0].digest') && echo $LATEST_DIGEST

## List snasphots
NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client list

## Show snasphot details
NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client show $LATEST_DIGEST

## Download snasphot
NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client download $LATEST_DIGEST

## Restore snasphot
NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client restore $LATEST_DIGEST

2022-08-24

Mithril session

  • We have reviewed the PR Flaky tests #374 that corrects the CI flakiness of mithril-core 🥳 . There is still a question regarding the implementation of the portable feature of blast that we need to investigate as we are using the artifacts built by the CI to created Docker images (and in the future released binaries). Also when merging this PR we will have to reset/recreate the stores on the GCP Aggregator (as the keys currently generated with zcash are not compatible with the blastkeys). We should merge at the end of the iteration. After some discussions, we have decided to use a feature portable in the mithril-core library and not to re-expose mithril-core from mithril-common. This feature will be used in the CI (tests and artifacts released) at first. We still need to understand what is different between portable and not portable blast (apparently related to IAS extensions that may causethe SIGILL) and also we will work on adapting the CI and artifacts (Docker, executable) production with the idea that we must test the artifacts that we release.

  • We have reviewed the latest commits of the PR Implement Real Genesis Certificate #438. We will continue to work on it and expect to merge it shortly

  • Also, we have paired on the use SQL store #444

2022-08-23

Mithril session

  • We have reviewed and merged the Repository is missing a CONTRIBUTING document #446. We also had discussions about the final steps before open sourcing a branch protection rules before merging a PR (see)

  • We have paired on:

    • The issue use SQL store#444
    • The issue Bootstrap Certificate Chain w/ Genesis Certificate #364 that requires an update of the Aggregator runtime state machine
    • The issue Less signatures when phi_f is increased #448 which was apparently due to the usage of a wrong value of the Protocol Parameters
  • We have activated the Require approvals feature on the repository before merging new PRs (this will be needed when open sourcing the repository)

2022-08-22

Mithril session

  • We have paired on numerous bug fixes and enhancements related to the flakiness of the CI:

    • Remove quorum check in aggregator multi-signer #441
    • Make fake Beacon static #442
    • Ask for a Protocol parameters for test_setup::setup_signers #445
    • Fix Multi Signature Determinism #447
  • We have reviewed and merged:

    • The PR Aggregator check existing certificate #435 which closes the issue Aggregator is stuck in "Signing" state when epoch changes #431 🥳
    • The PR Move Certificate Verifier to Common #436. It prepares the work to be done in the issue Bootstrap Certificate Chain w/ Genesis Certificate #364 for which we have been talking about the steps that needs to be completed
    • The PR add code doc & factor service initialization #440 that relates to issue Prepare open-sourcing of repository #92
  • We had discussions about the need to handle data structure update and to have debug tools. A way to work on these two issues is to use SQLite and implement a store adapter on top of it. We will run a small PoC on this implementation

2022-08-19

Mithril session

  • We have merged the Add signer integration test #430 🥳

  • We have also reviewed the first PR of the issue Aggregator is stuck in "Signing" state when epoch changes #431 that will be merged shortly. We will pair on the second part of the issue which requires some modifications of the Snapshots store

  • We had also discussions about the Mithril Keys Certification:

    • We have reviewed the PR New STM registration procedure #433
    • We still need to find out how to retrieve all the information needed (KES Key period with Cardano Cli and Cold Verification Key maybe from the Core Cardano Node)
    • We were wondering if the KES Keys are renewed by overwriting the files. If this is he case, it means that we would need to reconfigure the Signer node after renewal of the keys
    • The Signer does 2 new things during key registration:
      • Sign the Mithril Verification Key with the KES Secret Key to produce a KES Signature
      • Send the Operational Certificate, the Cold Verification Key, the KES Period and the KES Signature to the Aggregator during the registration process
    • The Aggregator will verify the authenticity of the Pool Id and the associated Mithril Verification Key during the registration of the Signer. It will allow the Aggregator to match the Pool Id with the Stake Share retrieved from the Cardano Node. We still need to check if the Operational Certificate, the Cold Verification Key, the KES Period and the KES Signature need to be stored on the Aggregator
    • For now, the Core library will keep computing the Merkle trees the same way and use only the Stakes from the registered Signers (and not from the whole Cardano Network)
    • Before we merge this PR, we will need to have a running SPO node on GCP (that needs to be configured) so that we don't miss epochs in the Certificate Chain
  • We had also talks about the Genesis Keys:

    • We will probably store the Genesis Keys with the same codec as the other keys used in Mithril (by using serde (de)serialization and base64 encoding) in the first place
    • However, the Genesis Keys used by the Cardano Node seem to be using a cbor format. We will try to handle this encoding instead
    • Another question that was raised is where can we find the mainnet Genesis Verification Key?

2022-08-18

Mithril session

  • We have reviewed and will merge shortly the latest modifications of the issue Add signer integration test #430

  • We have paired on understanding and fixing a bug on the Aggregator Aggregator is stuck in "Signing" state when epoch changes #431. Some PRs that fix the problem are in progress and will be merged shortly

  • Following the occurrence of this bug, we have thought that it would be a good idea to implement a Max Error feature for a runtime cycle: if the runtime is in error M times in a row for the same state, the Aggregator runtime would panic. This would also help us spot early problems in state transitions

  • We had also discussions about the Mithril Keys Certification:

    • In order to verify the SPO that is running a Mithril Signer, we will sign the Mithril Verification Key with the Cardano Hot Secret Key aka KES.skey and we will verify it with the Cardano Hot Verification Key aka KES.vkey that is stored inside the Operational Certificate of the Cardano Node of the SPO
    • Every 6 epochs, the KES Keys are rotated and a new Operational Certificate will be issued. This means that we need to retrieve the current Operational Certificate at each epoch (before the Signer registers its keys with the Aggregator)
    • We will try to stay on the Cardano Relay Node and avoid if possible to work with the Block Producing Node. It means that the PoolId which is the hash of the Cold Verification Key should be declared by the SPO (and also verify that it matches with the one included in the Operational Certificate)
    • The Mithril Verification Key Signature must be verified on the Signer at startup and also on the Aggregator during registration
    • We will include the KES.skey siging of the Operational Certificate in the core library
    • We will maybe use the Cardano Cli to verify the signatures as it will require less work at first. This code should be incorporated into the core library when we go to mainnet
    • We also need to find a way to retrieve the Operational Certificate from the Cardano Cli

2022-08-17

Mithril session

  • We have reviewed and merged the PR Certificate chain integration test for Aggregator #424.It should fix some bugs related to issue Produce valid certificate chain for several epochs on Devnet #396

  • We have also reviewed and paired on the Greg/317/signer integration test #426. It should be merged shortly

  • We have also discussed about the Certificate Chain:

    • Epoch Gap: We will work in the first place on handling the Epoch Gap with using the latest "certified" stake distribution to sign the current epoch as defined in the previous Research/Engineering session. This will be done when the devnet is working smoothly. The mechanism needs to:
      • Detect a gap in the Certificate Chain in the Aggregator
      • Modify the Beacon of the Pending Certificate to use the previous Epoch in the Aggregator
      • Make the Signers use the Epoch from the Beacon of the Pending Certificate in order to select the Protocol Initializer and Stake Distribution to use to produce Single Signatures
    • Multiple Protocol Parameters: the Aggregator can try multiple sets of parameters (with equivalent security level) on the gathered Single Signatures in order to produce the most efficient Multi Signature. It will try the harder to reach parameters first. The only constraints on the parameters are:
      • They must share the same parameter phi_f value that is used to create Protocol Initializer
      • The Signers must use the worst case parameters (the one with the highest number of lottery attempts m)
    • Genesis Certificate: We will try to put in place a process in the testnet that is as close as possible as what we will deploy on the mainnet. The genesis mechanism would the as follows:
      • The Aggregator must wait until a Genesis Certificate is available before appending any Certificate to the chain
      • In the mean time, the Signers will be able to proceed to the key registration
      • At a manually selected epoch (preferably at the beginning of the epoch), the Genesis Certificate Bootstrap will happen
      • Once the Genesis Certificate is saved in the Aggregator store, it will be able to produce valid Certificates and to append them to the chain. This should start occurring at the next epoch.
      • The Genesis Certificate Bootstrap will be done as follows:
        • Export the payload/message to be signed in the Genesis Certificate from the Aggregator (via cli) and store a Proto Genesis Certificate (unsigned)
        • Use the Genesis Private Key to sign this message and create a Genesis Signature (cold process, done out of Mithril Network on the mainnet, can be done via Mithril cli on the testnet and devnet)
        • Import the Genesis Signature back in the Aggregator and update the Proto Genesis Certificate and convert it to a definitive Genesis Certificate (metadata will be updated and hash needs to be recomputed, done via cli)
    • Mithril Keys Certification: This subject is still under definition, but some issue arose about:
      • Do we need to run a Mitril Signer on the Block Producing Node just for this certification (the one that holds the cold keys required to sign and that is closed to the outside)? Or is this operation done by the Cardano Node itself?
      • The Mithril Signer will be running on the Relay Node, the one that is opened to the outside world (and does not have access to the hot keys)

Daedalus/Lace Session

  • This was the first meeting with the Daedalus/Lace team. The goal was to understand each other needs and to setup short term goals and working environment

  • Daedalus end of life will happen soon and Lace will replace it (with an Open Source approach). Lace will also handle a light client wallet

  • We showcased the restoration of a Cardano Node on the testnet thanks to a Mithril Snapshot

  • Questions discussed:

    • Is it possible to restore not the full immutable database, but instead work with the range of missing files? (Answer is yes, but not on the first version as the feature is not implemented yet)
    • How secured is Mithril and the downloaded snapshot? (Answer is fully secured by design, %age of SPOs participating, and protocol parameters selection)
    • Who pays for the bandwidth? (Answer is IOG for the Aggregator that it currently hosts, and each Aggregator provider when multiple are available. Also we have plans for using peer to peer networks for hosting the archives)
    • What about Utxo set? (Answer is not implemented yet, but will allow Mithril to handle light wallets)
    • What about the new testnet? (Answer is we need to work in that issue, but the new testnet is not stable enough at the moment)
    • Do the Mithril client binaries exist for Linux, macOS and Windows? (Answer is not yet but easy to do, will be part of the work)
    • How to communicate with a Mithril Client? (Answer is stdout or text file in a first version, then IPC later. It will provide a percentage of completion and error/log messages. Will work the same whether running on Daedalus or Lace)
    • How to integrate Mithril snapshot restoration in the wallet? (Answer is by being a part of the Cardano Launcher module of the wallet. Once the archive is extracted, Mithril is not used/needed anymore)
  • Next steps for the PoC:

    • Setup another meeting to create technical tasks in Jira/Github Projects with engineers
    • Create a dedicated private Slack channel with members from the 2 teams

2022-08-11

Mithril session

  • We have reviewed the PRs about:

    • Integration tests on the Signer (incoming)
    • Add Store Protocol Parameters in Aggregator #385 that is ready to be merged
  • All our efforts have paid off and we now have the GCP Aggregator working smoothly, see issue Produce valid certificate chain for several epochs on Testnet#397. However, we will monitor it closely to be sure that there are no other snapshot producing blockers

  • Also we have noticed that the refresh rate of the runtime interval of the Mithril nodes (especially the Aggregator) seem to have a high impact on flakiness of the CI/devnet. We are still activaly investigating this issue Produce valid certificate chain for several epochs on Devnet#396, however the flakiness is now considerably mitigated

  • We have also prepared the demo path:

# Mithril Certificate Chain
# On devnet, with evolving undeterministic verification keys, with evolving stake distribution

# Resources

## Github
google-chrome https://github.com/input-output-hk/mithril

## Architecture
google-chrome https://mithril.network/doc/mithril/mithril-network/architecture

## Certificate Chain
google-chrome https://mithril.network/doc/mithril/mithril-protocol/certificates

## Explorer
google-chrome https://mithril.network/showcase/

---

# Setup demo

## Download source (if needed)
git clone https://github.com/input-output-hk/mithril.git
#or
git clone [email protected]:input-output-hk/mithril.git

## Checkout correct commit
cd mithril/
git checkout 4325260ec657b4cde0d4be5c6ff2a23241f2d886
cd mithril-client && make build && cp mithril-client ../../ && cd ..

---
# Demo: Download & Restore Latest Snapshot All In One (~20 min)
NETWORK=testnet && AGGREGATOR_ENDPOINT=https://aggregator.api.mithril.network/aggregator && LATEST_DIGEST=$(curl -s ${AGGREGATOR_ENDPOINT}/snapshots | jq -r '.[0].digest') && echo $LATEST_DIGEST && NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client list -vvv && NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client show $LATEST_DIGEST -vvv && NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client download $LATEST_DIGEST -vvv && NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client restore $LATEST_DIGEST -vvv

NETWORK=testnet && AGGREGATOR_ENDPOINT=https://aggregator.api.mithril.network/aggregator && LATEST_DIGEST=$(curl -s ${AGGREGATOR_ENDPOINT}/snapshots | jq -r '.[0].digest') && echo $LATEST_DIGEST && NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client list && NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client show $LATEST_DIGEST && NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client download $LATEST_DIGEST && NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client restore $LATEST_DIGEST

---
# Demo: Launch a Mithril Network explorer

## Change directory
cd mithril-showcase

## Build website
make dev

## Open explorer
google-chrome http://localhost:3000/showcase

---
# Demo: Bootstrap and start a Mithril/Cardano devnet

## Change directory
cd mithril-test-lab/mithril-devnet

## Run devnet with 1 BTF and 2 SPO Cardano nodes
MITHRIL_IMAGE_ID=main-4325260 NUM_BFT_NODES=1 NUM_POOL_NODES=2 EPOCH_LENGTH=45 SLOT_LENGTH=1.0 DELEGATE_PERIOD=90 ./devnet-run.sh

## Watch devnet logs
watch -n 1 LINES=5 ./devnet-log.sh

## Watch devnet queries
watch -n 1 NODES=cardano ./devnet-query.sh

## Visualize devnet topology
./devnet-visualize.sh

## Stop devnet
./devnet-stop.sh

# Client
## Get Latest Snapshot Digest
NETWORK=devnet
AGGREGATOR_ENDPOINT=http://localhost:8080/aggregator
LATEST_DIGEST=$(curl -s ${AGGREGATOR_ENDPOINT}/snapshots | jq -r '.[0].digest')
echo $LATEST_DIGEST

## List Snapshots
NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client list

## Show Latest Snapshot
NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client show $LATEST_DIGEST

## Download Latest Snapshot (Optional)
NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client download $LATEST_DIGEST

## Restore Latest Snapshot
NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client restore $LATEST_DIGEST

## All at once
NETWORK=devnet && AGGREGATOR_ENDPOINT=http://localhost:8080/aggregator && LATEST_DIGEST=$(curl -s ${AGGREGATOR_ENDPOINT}/snapshots | jq -r '.[0].digest') && echo $LATEST_DIGEST && NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client list && NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client show $LATEST_DIGEST && NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client download $LATEST_DIGEST && NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client restore $LATEST_DIGEST

2022-08-10

Mithril session

  • We have reviewed and merged the issue Add state machine runtime Signer #317 🥳 it apparently solves the problem that prevented the creation of certificates because signer registration was not done properly at each epoch

  • We have also reviewed and merged the issue Add/Use Protocol Initializer Store in Signer #362. The non deterministic verification keys have been rolled back and a bug has been fixed in the Clerk computation . With invariant Stake Distribution, the network is able to generate a valid Certificate Chain 💪

  • We still have some flakiness occurring when the stake distribution changes and we are actively investigating them

Mithril Research/Engineering sync

  • This was the first official meeting to synchronize Research and Engineering teams. This meeting will take place every 2 weeks

  • We have mainly discussed about how to handle an Epoch gap in the Certificate Chain (see Mithril Client fail to validate certificate chain if the previous certificate is more than one epoch older #377:

    • Having no epoch gap in the Certificate Chain is mandatory to guarantee the security of the protocol an avoid "long range" attacks
    • Re-genesis the Certificate Chain is always possible and "nuclear" option used if nothing else works
    • In case of multiple Aggregators, downloading a valid chain from another Aggregator is possible
    • Also an Aggregator should be able to try different protocol parameters in order to produce the multi signature:
      • They would provide the same security level
      • But the first tried would produce lighter signatures (whereas the quorum would be harder to be reached)
      • If a multi signature is produced, no other tries
      • If not, a different set of parameters is tried
    • If an Aggregator is not able to produce a valid certificate at epoch n, and is now at epoch n+1:
      • It should use the previously valid stake distribution (next AVK) in certificate at epoch n-1
      • Instead of the stake distribution at epoch n which is not validated
      • And produce a certificate for epoch n+1

2022-08-09

Mithril session

  • We have paired on the Add state machine runtime Signer #317 and Add/Use Protocol Initializer Store in Signer #362 issues all day long. We hope to merge very shortly 💪

  • We have also had discussions on the Add Store Protocol Parameters in Aggregator #385: this implies that the Next Protocol Parameters are broadcasted in the Pending Certificate of the Aggregator

2022-08-08

Mithril session

  • We have reviewed and merged all the PRs that relate to issue Configure SSL certificate for Mithril Aggregator GCP #324. The showcase is now working correctly on the production documentation website and it will be activated in the navbar shortly 🥳

  • We have reviewed and paired on the issue Add state machine runtime Signer #317 that is a blocker for 3 other issues so that we can complete it asap and not jeopardize the demo of the iteration. There is still much work to do and some questions are still open (in particular regarding the epoch that should be used: from the Cardano node or the Pending Cetificate). This is our main focus for the following days

2022-08-05

Mithril session

  • We have reviewed some work that has been done yesterday on the Add state machine runtime Signer #317

  • We have also created new issues (wth high priority) related to fixes/optimizations that need to be implemented to:

    • Produce valid certificate chain for several epochs on Testnet #397
    • Produce valid certificate chain for several epochs on Devnet #396
  • Following our conversations from the previous days, we created an issue Simplify the Multi Signer in Aggregator #398 that will conduct a study on what is the best strategy to enhance the Multi Signer

2022-08-04

Mithril session

  • We had discussions about how we can handle missing certificates for some epochs in the Certificate Chain. The problem is tricky and could be solved by:

    • Using a higher epoch offset and embedding in the signed message multiple Next AVKs. This could work, but would be cumbersome (as the Signers would have to wait more epochs before being able to sign)
    • Use the Aggregator beacon to handle certificate creation for an epoch at a later epoch when network is back up. This means that the Aggregator is in charge of broadcasting the epoch to be used by the Signers to individually sign. This solution is likely to be the most simple to deploy, but it might not cover all of the cases that would be responsible for an epoch drop in the chain (for example if the Signers were not able to gather previous Stake Distribution on their end)
    • In a multiple Aggregator network, if an Aggregator misses an epoch (due to networking or operations trouble), it should be able to recover the chain by retrieving from any other up to date aggregator)
    • A last option to cover such an epoch drop would be to re genesis the chain (will always work, but hard to operate)
  • We have also talked about the Multi Signer of the Aggregator and the issue Reunite Beacon Store/Provider Aggregator #363. We have decided to replace the Beacon Store dependency with a Beacon that is fed by the runtime. Also, we have agreed that this module could be simplified and we will work on that step by step. Maybe we can split the module in sub modules and we should wait for the Certificate Chain to be fully functional before making to impacting modifications. In the mean time, we agreed on pairing whenever breaking modifications are applied we should be doing them in pair

  • We have paired intensively on the issue Add state machine runtime Signer #317

  • A last point we have discussed is that we should define a dedicated type for handling serialized keys from the Core library

2022-08-03

Mithril session

  • We have reviewed and merged a PR Improve aggregator dependencies management #382 regarding some optimization on the dependency management in the Aggregator

  • We have discussed about the issue Add state machine runtime Signer #317 and we have stated that:

    • We will use the Beacon Provider from the Aggregator in the Signer, which implies that the module will be moved to the mithril-common folder
    • The Immutable Digester will be fed with a Beacon at which it will compute the digest
    • The Signer will not rely any more on the Beacon retrieved from the Pending Certificate of the Aggregator
    • We will also paired on this issue after these adjustments have been done

2022-08-02

Mithril session

  • We have reviewed the PRs that have been done last week and took some time to talk about the epoch offset used to implement the Certificate Chain

  • We have discussed about several topics:

    • The flakiness of the CI that was partially fixed, but sometimes another error occurs which is apparently related to a gap in the certificate chain (one epoch is not signed). We will investigate that issue and also work on the possibility of verifying AVK signed certificates up to N previous epochs to avoid breaking the chain (currently N is 1, it could be a parameter of the Client). Also the code to verify a certificate could maybe be optimized for clarity (too many intricate match)
    • Implementing a Service Builderin the Aggregator to simplify usage of dependencies
    • Removing the Beacon Store (see issue Reunite Beacon Store/Provider Aggregator#363) and using only the Beacon Provider instead. This also means that we need to create a store for the States of the state machine of the Aggregator. This will allow the Aggregator to restart gracefully (and not sign the same Immutable File Number multiple times)
    • Improving the source of the Immutable File Number that should be only the responsibility of the Chain Observer and use this source to feed to the Immutable Digester (who should only be responsible for computation of the digest)
    • Also, the computation of the digest takes too long. An optimization would be to cache the digest of each immutable files and compute the digest as a root of a Merkle tree for example. This would require to compute almost only the hash of the latest Immutable File Number and would drastically reduce the time and CPU resources needed for computation
    • We could simplify state stores parameters by using only one Store Directory and use it as a prefix for all the stores data path. This would greatly reduce the complexity of the setup of the nodes and would avoid impacting other resources each time a new to store is added (GCP, test lab, ...)
    • Also in order to simplify querying and debugging of the stores we could:
      • Implement a SQLite adapter
      • Provide specific tools for retrieving/gathering the data from the stores
  • We also agreed that some efforts are still needed to stabilize the system so that

    • Snapshots and certificates are producing consistently (there are many hiccups on GCP)
    • The Signer seems to be mainly responsible for this and the ongoing re factorization and improvements in progress should allow it shortly

July 2022

2022-07-22

Mithril session

  • We have reviewed the latest developments for the issue Implement certificate chain Aggregator/Signer/Client #316. The PR has been merged 🥳

  • The PR Set indices to be represented as vectors instead of unique #351 has been merged and thus closes the issue Optimize single signature in Mithril Aggregator/Signer/Core #296 🎉

  • We have reviewed and talked about the issue Add integration tests in Mithril Aggregator #284 which should be ready to be merged shortly

  • We have also reviewed the developments in progress of the website Showcase section of issue Showcase snapshots/certificate pending on doc website #315. The first results look very good and we are keen on seeing it live on the website! As there is not always a Pending Certificate available, we were asking ourselves if maybe we could add a /beacon route on the Aggregator API that would display the current Beacon 🤔

2022-07-21

Mithril session

  • We have reviewed the showcase interface in its first version Showcase snapshots/certificate pending on doc website #315.It is working and displays the first information retrieved from the Aggregator. Some more work needs to be done in order to complete the issue

  • We have reviewed and talked about the Implement certificate chain Aggregator/Signer/Client #316: there seem to be a problem with the stake distribution update that prevents the Aggregator to produce multi signatures. Some investigation are in progress. If the fix is not obvious, a feature flag will be activated to allow the merging of the PR

  • We have discussed and contributed to the issue Optimize single signature in Mithril Aggregator/Signer/Core #296, specifically about the dedupliction of the won lottery indices. The PR should be merged shortly

2022-07-20

Mithril session

  • We have reviewed and paired on the Add integration tests in Mithril Aggregator #284. It is still under progress for the implementation of the Happy Path, but will be ready to merge shortly

  • We have reviewed the Implement certificate chain Aggregator/Signer/Client #316. Some enhancements will be done in the End to End Tests Runner and the PR should be merged shortly.

  • We have discussed about the short term fix for the issue Signer can not sign after restart (UnregisteredVerificationKey) #361. We agreed to switch temporarily to a deterministic Verification Key generator. The fix has been merged and works as expected on GCP 🥳 The long term fix will be implemeted in Add/Use Protocol Initializer Store in Signer #362

  • We also had discussions about the Showcase snapshots/certificate pending on doc website #315 issue and listed some nice to have features:

    • Use for the demo with the devnet in local website
    • Have a refresh every 30s on the first page
    • Implement responsive design pages

2022-07-19

Mithril session

  • The tickets of the current iteration have been sliced and created in the board

  • We have reviewed and paired on the issue Add integration tests in Mithril Aggregator #284. The AggregatorConfig struct was wrongly holding a reference to the DependencyManager which was preventing from using the full features of the DumbImmutableFileObserver (that will power the newly added tests).

  • We have also talked about how the Showcase section of the documentation website and the type of information that would be displayed. A first version could showcase:

    • The Pending Certificate if it exists, and the list of the latest Snapshots on a first page
    • The Snapshots provides a link to the associated Certificate details on a new page
    • The Certificate provides a link to the Previous Certificate in the chain if it exists

2022-07-18

Mithril session

  • We have made a review of the PRs that have been merged during the previous iterations and of the technical debt that we have accumulated so far. We have decided to take some time to lower this debt during the current iteration

  • Here is a list of the issues that have been listed as such:

    • Add and use a Verification Key Store in the Signer
    • The previous issue should fix a bug that makes the Signer to not recognize its Verification Key in the Signers list retrieved from the Pending Certificate (and trigger a UnregisteredVerificationKey error) after a restart (due to the randomness of the Verification Keys)
    • Reunite the BeaconStore and the BeaconProvider in the Aggregator (we need to check if we want to remove completely the BeaconStore)
    • The previous issue should fix a bug that makes the Aggregator create a new Pending Certificate for a Beacon that already has a Certificate
    • A bug that makes the Aggregator disk saturate (because the temp snapshot archive file is not deleted after upload)

2022-07-13

Mithril session

  • We have reviewed the PR Add certificate chain Aggregator/Signer/Client #355 in relation with Implement certificate chain Aggregator/Signer/Client #316 and discussed about some small adjustments that will be done shortly

  • We have also reviewed and merged the Enhance documentation website #356 with:

    • The enhanced Glossary section of the website
    • The enhanced Mithril Certificate Chain in depth page

2022-07-12

Mithril session

  • We have paired on the bug of the issue Fix test lab CI flakiness #352:

    • A fix to the single signer of the Mithril Signer was applied (concerning the late instantiation of the protocol initializer)
    • We fine tuned the runtime intervals of the Signer and Aggregator nodes (which were running with the same cadence and thus was a source of flakiness)
    • We made some tests with 2 signers and an epoch offset of -1 and the execution time of the test lab is still very good (~2m 30s)
    • We will merge with 2 signers and an epoch offset of 0 at first (as there are still some unexplained delays in signer registration with a non 0 epoch offset)
    • We have also identified an optimization when producing the CI run attempts artifacts (to separate them clearly). It will be included in this PR
  • We also discussed about the ongoing issues:

    • Optimize single signature in Mithril Aggregator/Signer/Core #296
    • Implement certificate chain Aggregator/Signer/Client #316
  • We have paired on the issue Optimize single signature in Mithril Aggregator/Signer/Core #296, on the PR Set indices to be represented as vectors instead of unique #351 in order to find the best way to deduplicate indices of the single signatures before generating a multi signature. We will continue pairing on this tomorrow.

2022-07-11

Mithril session

  • We have talked about solving the flakiness of the test lab in the CI. The solution is under development and the new version of the end to end test runner along with the activation of the epoch offset should work. At the same time, the parameters of the devnet are fine tuned in order to keep the fast test execution time. A PR Lessen test lab flakyness #350 has been pushed and will be merged shortly

  • The website documentation enhancements has been reviewed in the PR Enhance documentation website #349. It will be merged shortly and will deploy the following changes:

    • Enhanced Getting Started pages
    • Enhanced Developer Docs > Mithril Network pages
    • Reorganized About Mithril section with clear Mithril Protocol and Mithril Network menus

2022-07-08

Mithril session

  • We have reviewed the work in progress regarding the integration tests of the Aggregator runtime of this issue Add integration tests in Mithril Aggregator #284. We had discussions about the purpose of the tests and decided to use the runtime tests as unit tests and work on a happy path scenario with the full node for the integration test.

  • We have reviewed the issue Cannot sync a cardano-node using latest snapshot on GCP #344. After investigations, it appears that the issue is linked to the 1.35.0 version of the Cardano node and is fixed in the 1.35.1

  • We also had discussions about the use of nigthly/pre-release/release tags (and packages & environments). We will start with the nightly one

  • Also, the CI is very flaky at this time (mainly because the test lab is failing due to using the same epoch for registration and signing). We have decided to activate an epoch offset of -1 and to work on fine tuning the devnet to accelerate the production of immutable files and epochs. This should fix the problem and should be available shortly.

2022-07-07

Mithril session

  • We have reviewed and closed the Enhance runtime state machine Aggregator #323 issue which will prevent the Aggregator to update the stake distribution too often

  • We have also merged some bug fixes and enhancements:

    • fix a bug in Beacon PartialOrd #330
    • Add 'latest' tag to Docker images #334
    • Update 'cardano-cli' to version 1.35.0 on GCP #335
    • Clean 'CertificatePending' entity #331
    • Enhance documentation website #332
  • We have paired on the Optimize single signature in Mithril Aggregator/Signer/Core #296 that should be merged shortly

2022-07-06

Mithril session

  • We have paired on getting the project one step further toward open sourcing:

    • Creating a service account so that we are autonomous in managing the cloud operations (Aggregator hosting and Terraform on the CI)
    • Activating the Discussions feature on the repository
    • Finding how to correctly handling the latest tagging of Docker images (such as what has been done on hydra)
    • Finding a way to add an automatically renewing SSL certificate to the Aggregator API (with Let's encrypt)
    • Reviewing the new documentation tutorial pages (that need a second pair of eyes and beta testers to verify that they are functional and easy to use)
  • We had discussions about:

    • Upgradable protocol parameters: the Aggregator will keep on broadcasting the Protocol Parameters used for the current epoch and they will be stored y the Signers (along with the Verification Keys for easy retrieval and usage)
    • Epoch offsetting strategy: the -1 and -2 that are used to work with the Stake Distribution and the Verification Keys are well defined constants that will probably never change (as they provide sufficient security). It is therefore better to use them as hard-coded constants that will be provided at compilation time for the Signer and the Aggregator, than as an information provided at runtime by the Aggregator
    • Certificate Chain Verification Requirements: The multi signatures embedded in the Certificates must be verifiable even though the cryptographic library has evolved along the way
      • The message signed needs to be switched to a map format where we are free to add new entries without breaking the chain validation (today only with a immutable_digest entry and later with other such as utxo_set for example)
      • We could maintain a set of verifier functions in the core library for each earlier version (could be cumbersome)
      • We could add a verifier function compiled in WASM that is stored in the certificate
      • We could add a format migration feature to the certificate chain
      • We could add milestone genesis certificates that would provide a complimentary signature to certificates (produced with the genesis keys in the certificate) from time to time (e.g. every N epochs or as soon as a break in backward compatibility is introduced in the code)
      • We could also implement such a mechanism automatically by using the Cardano chain (but that would involve posting a transaction on it)
    • Releases packaging: In order to facilitate the distribution of the nodes (particularly to the SPO) and to have a broad adoption of the protocol, we will need to work on deploying packages for each release (.deb, .rpm, ...) with the CI

2022-07-05

Mithril session

  • We have reviewed and merged the following PRs:

    • User Manual & Getting Started documentation #321
    • Enhance Signer re-registration #322
  • We have paired on updating the state machine of the Aggregator runtime so that it computes the stake distribution only once for an epoch:

Mithril

  • We have also paired on creating the state machine of the Signer runtime:

Mithril2

  • During this pairing session we had many discussions about:
    • The usefulness of the Beacon used in the certificate pending
    • The fore coming work that will be done regarding the Certificate Chain implementation
    • And some long term implications of the multiple Aggregators running and what it means on how we compute the multi signatures

2022-07-04

Mithril session

  • The tickets of the current iteration have been sliced and created in the board

  • We have reviewed and merged the PR Improve UI/UX documentation website #309. The UI/UX review comments have been taken into account in their vast majority. The website content is under redaction and this work will continue during the iteration

  • We had a session related to the Certificate Chain which goal was to:

    • Specify which information to embed in the Genesis Certificate
    • Specify which information to embed in the other certificates of the chain
    • Define how to link the certificates to each others
    • Define how to verify a certificate
    • Some questions remain such as:
      • Is the Mithril Epoch 0 an empty epoch (which means no other certificate than the Genesis one will be produced)?
      • What is the exhaustive list of information that we need to embed in the Medata(p,n) group? (Among Certificate Version, Protocol Parameters, Dates, Signers List which included their single signature in the multi signature)
    • Here is a diagram that summarizes the structure of the chain: (see on miro)

Mithril

  • We have paired and merged the last step of retrieving the real Stake Distribution from the Cardano node Use SD from cardano-cli in Aggregator/Signer #314 🥳
Clone this wiki locally