Skip to content

Commit

Permalink
Merge pull request #101 from 0xPolygonMiden/dominik_docs_state
Browse files Browse the repository at this point in the history
docs: adding architecture/state
  • Loading branch information
Dominik1999 authored May 2, 2023
2 parents e739d77 + 99a5ccf commit 63549f0
Show file tree
Hide file tree
Showing 4 changed files with 46 additions and 57 deletions.
79 changes: 26 additions & 53 deletions docs/src/architecture/state.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# State

the system maintains 3 databases to describe the state:
The Miden Node(s) maintain three databases to describe the state:
1. A database of accounts.
2. A database of notes.
3. A database of nullifiers for already consumed notes.
Expand All @@ -11,89 +11,62 @@ the system maintains 3 databases to describe the state:

## State components

As mentioned above, the state consists of 3 components: account, note, and nullifier databases. These databases are represented by authenticated data structures (e.g., Merkle trees), such that we can easily prove that items were added to or removed from a database, and a commitment to the database would be very small.
These databases are represented by authenticated data structures (e.g., Merkle trees), such that we can easily prove that items were added to or removed from a database, and a commitment to the database would be very small.

### Account database
Account states could be recorded in a Sparse Merkle tree (or a variation thereof) which maps account IDs to account hashes, where account hash is computed as `hash([account ID], [storage root], [vault root], [code root])`.
Account states are recorded in a Sparse Merkle tree which maps account IDs to account hashes, where account hash is computed as

`hash([account ID, 0, 0, nonce], [vault root], [storage root], [code root])`.

<p align="center">
<img src="../diagrams/architecture/state/Account_DB.png">
</p>

There could be two types of accounts:
* **Public accounts** where all account data is stored on-chain. Transactions executed against such accounts must be *network transactions* - i.g., transactions executed by the network.
* **Private accounts** where only the hashes of accounts are stored on-chain. Transactions executed against such accounts must be *local transactions* - i.e., transactions where the user submits a ZKP to the network attesting to the correct execution of the transactions. (it is possible to relax this condition so that users could execute network transactions against private accounts, in which case, all account data would need to be included in a specific transaction - but for simplicity, I won't consider this here).

It is important to note that fees for local transactions will probably be much lower than fees for network transactions (because for a local transaction, the network just needs to verify a ZKP). Thus, users are incentivized to use private accounts, unless they indeed need the functionality offered by public accounts.

A potential concern could be that losing a state of a private account would mean loss of funds (as the user won't be able to execute transactions) in a similar manner as a loss of a private key would. But this problem can be easily mitigated by storing encrypted account state in a cloud or backing it up somewhere else. Unlike storing private keys in the cloud, this would not compromise privacy or security of an account.

Having many (or even most) of the accounts be private is very beneficial for the network as a private account contributes only 64 bytes to the global state (32 bytes account ID + 32 bytes account hash). Or, said another way, 1 billion private accounts takes up only $60$ GB of state.
As described in [accounts](https://0xpolygonmiden.github.io/miden-base/architecture/accounts.html), there are two types of accounts:
* **Public accounts** where all account data is stored on-chain.
* **Private accounts** where only the hashes of accounts are stored on-chain.

The situation with public accounts is a bit more challenging - but our model has very nice properties here too.
> Losing a state of a private account would mean loss of funds (as the user won't be able to execute transactions) in a similar manner as a loss of a private key would. This problem can be easily mitigated by storing encrypted account state in a cloud or backing it up somewhere else. Unlike storing private keys in the cloud, this does not compromise privacy or security of an account.
First, observe that to verify validity of state transition we do not need to know full account states (i.e., we just need to know hashes of the state so that we can verify ZKPs). We need to know full account states only to execute public transactions agains them.
Note: Having many (or even most) of the accounts be private is very beneficial for the network as a private account contributes only 64 bytes to the global state (32 bytes account ID + 32 bytes account hash). Or, said another way, 1 billion private accounts takes up only $60$ GB of state.

Thus, as a node, we could chose to discard full states for public accounts which haven't been used for some time. All this means is that we won't be able to execute transactions against these accounts, but if someone else execute a transaction against them, we can still verify that the state transition was valid (and we'll get the new full state of the account with the latest block).
### Notes database

It is important to note that the decision when to discard full account states does not need to be a part of the protocol - every node could decide for themselves. For example, there could be nodes which prune full state very aggressively, and in effect, they would only be able to include private transactions in their blocks. There could also be nodes which decide to keep full account states for years - so that they could execute transactions which few other nodes could (presumably for a higher fee).

This approach eliminates the need for complicated mechanisms such as state rent. The longer a public account remains unused, the fewer nodes would want to keep its full state. The fewer nodes keep its full, state, the higher fees can be demanded for executing transactions against this account. Thus, the nodes which chose to keep full states longer should get naturally compensated for their services.

### Note database

Notes could be recorded in an append-only accumulator similar to the one described [here](https://eprint.iacr.org/2015/718). Using such an accumulator is important for two reasons:
Notes are recorded in an append-only accumulator, a [Merkle Mountain Range](https://github.com/opentimestamps/opentimestamps-server/blob/master/doc/merkle-mountain-range.md). This is important for two reasons:

1. Membership witnesses against such an accumulator needs to be updated very infrequently.
2. Old membership witnesses can be extended to be used with a new accumulator value, but this extension does not need to be done by the original witness holder.

Both of these properties are needed for supporting local transactions and private accounts.

Notes database could look as shown on the diagram below. Here, the database contains $6$ notes: $1$ through $6$, and the commitment to this database are the roots of individual trees `(a, b)`. Thus, the size of the commitment grows logarithmically with the number of items in it.
There are two types of [notes](https://0xpolygonmiden.github.io/miden-base/architecture/notes.html):
* **Public notes** where the entire note content is recorded in the state.
* **Private notes** where only a note's hash is recorded in the state.

As with accounts, there is a strong incentive to use private notes as they result in lower fees. This is also beneficial to the network as a private note adds only 64 bytes to the state (32 bytes when it is produced, and 32 bytes when it is consumed).

Notes database look as shown on the diagram below. Each leaf is a block header which contains the commitment to all notes created in that block. Note, the size of the Merkle Mountain Range grows logarithmically with the number of items in it.

<p align="center">
<img src="../diagrams/architecture/state/Notes_DB.png">
</p>

As with accounts, there could be two types of notes:
* **Public notes** where the entire note content is recorded in the state. Such notes can be consumed either in local or in network transactions - but in either case, they don't provide any privacy guarantees (i.e., it will be trivial to figure out which account consumed such notes).
* **Private notes** where only a note's hash is recorded in the state. Such notes can be consumed only in local transactions (though it is possible to relax this condition).

As with accounts, there would be a strong incentive to use private notes as they would result in lower fees. This is also beneficial to the network as a private note adds only 64 bytes to the state (32 bytes when it is produced, and 32 bytes when it is consumed).
Using a Merkle Mountain Range (append-only accumulator) means that we can't remove individual elements from it. This seemingly means that the size of the note database would grow indefinitely. Moreover, at high tps, it would grow very quickly: at 1K TPS we'd be adding about 1TB/year to the database.

Using an append-only accumulator means that we can't remove individual elements from it. This would seemingly mean that the size of the note database would grow indefinitely. Moreover, at high tps, it would grow very quickly: at 1K tps we'd be adding about 1TB/year to the database.
However, we need to explicitly store only the unconsumed public notes and enough info to construct membership proofs against them. Private notes, as well as public notes which have already been consumed, can be safely discarded. Such notes would still remain in the accumulator, but there is no need to store them explicitly as the append-only accumulator can be updated without knowing all items stored in it. This reduces actual storage requirements to a fraction of the database's nominal size.

However, we need to explicitly store only the unconsumed public notes and enough info to construct membership proofs against them. Private notes, as well as public notes which have already been consumed, can be safely discarded. Such notes would still remain in the accumulator, but there is no need to store them explicitly as the append-only accumulator can be updated without knowing all items stored in it. This would reduce actual storage requirements to a fraction of the database's nominal size.

Moreover, since notes are not meant as long-lived objects, we can impose some lifetimes restrictions on them. For example, we could say that the system will store only $2^{35}$ most recent notes (a note not consumed in this timeframe becomes un-spendable). This can be easily done by discarding old commitment roots once the number of roots exceeds 35. At 1K tps this would mean that notes would be discarded after about 1 year, but this would also mean that the size of the note database will never grow beyond about 1TB.
ToDo: Describe and specify life-time restrictions for notes on the database.

### Nullifier database

With nullifier database we want to achieve the following properties:
1. We need to be able to check that a given nullifier is not in the database. This is needed to ensure that notes consumed in a transaction haven't been already consumed.
2. We need to be able to prove that a given nullifier is not in the database. This is needed for state transition proofs we want to submit to L1.
3. We need to be able to add new nullifiers to the database. This would be done by an operator at the time when they create a new block.

To satisfy these properties we can use a Sparse Merkle tree which maps nullifiers to block heights at which they were created. For example, in the diagram below, the tree contains 2 nullifiers: nullifier `01` was inserted into the database at block height $4$, while nullifier `10` was inserted into the database at block height $5$.
Nullifiers provide information on whether a specific note has been consumed yet. Nullifiers are stored in a Sparse Merkle Tree, which maps nullifiers to block heights at which they were created. Using this, one can check and prove that a given nullifier is not in the database. There will be one tree per epoch (~3 months), and Miden nodes store always at least the last two trees. However, the roots of the old trees are still stored.

<p align="center">
<img src="../diagrams/architecture/state/Nullifier_DB.png">
</p>

To prove that nullifier `11` is not in the database we need to provide a Merkle path to its node, and then show that the value in that node is $0$. In our case nullifiers would be 32 bytes each, and thus, the height of the Sparse Merkle tree would need to be 256.

To be able to add new nullifiers to the database, operators needs to maintain the entire nullifier set - otherwise they would not be able to compute the new root of the tree. This presents a challenge similar to the one we encountered with the note database: the set of nullifiers seemingly needs to grow indefinitely. Worse, unlike with notes, we cannot discard any nullifiers at all.

However, the fact that notes are short-lived can be again used to our advantage. Specifically, if we know that notes "expire" after 1 year, we can safely discard all nullifiers which have been created more than 1 year ago. This also puts a maximum on the size of the nullifier set.

However, unlike with the note accumulator, removing nullifiers from the nullifier tree is more complicated: we can't just discard one of the old roots, we need to remove nullifiers from the tree one by one. Generating proofs that nullifiers have been removed correctly (i.e., the block height of removed nullifiers was smaller than block height from a year ago) would involve non-negligible amount of work. To make sure operators do this work, they may need to be incentivized (e.g., via a small payment for each removed nullifier).
To prove that nullifier `11` is not in the database we need to provide a Merkle path to its node, and then show that the value in that node is $0$. In our case nullifiers are 32 bytes each, and thus, the height of the Sparse Merkle tree need to be 256.

## Evaluation
Assuming the above model works, we get the following:
To be able to add new nullifiers to the database, operators needs to maintain the entire nullifier set - otherwise they would not be able to compute the new root of the tree.

1. The state size depends primarily on two things:
a. Number of active public accounts. Inactive accounts and private accounts do contribute a little bit to the state - but their contribution would be small (64 bytes per account).
b. TPS - the higher the TPS the more notes and nullifiers we need to store. But the overall requirements are not huge. At 100 TPS, note and nullifier databases are unlikely to grow over 100 GB, and at 1K TPS, they are unlikely to get over 1TB. We can of course make the window during which notes remain live smaller (e.g. 6 months, or even 3 months) and in that case, state size would drop proportionally.
2. State transition shouldn't be too difficult to verify with a ZKP. We should be able to write a Miden VM program which takes the initial state and a set of transaction proofs as inputs, and outputs a new state together with a proof of the state transition.
3. Besides being able to verify state transition on L1 we also get a nice benefit that new nodes can sync up to the state pretty quickly as all they need to do is download data and verify ZKPs - no need to re-execute any of the past transactions.
4. Holding the entire state is not required for verifying validity of state transition. Thus, nodes may chose to drop various part of the state (e.g., some nodes may chose not to store states of public accounts, other nodes may chose to store states of specific public accounts etc.).
If a user wants to consume a note that is more than 6 month old, there must be a merkle path provided to the Miden Node for verification.
4 changes: 2 additions & 2 deletions docs/src/introduction.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Polygon Miden Intro

## Welcome to the Polygon Miden Documentation
*This documentation is not finished and under development*
> *This documentation is still Work In Progress. Some topics have been discussed in greater depth, while others require additional clarification. Sections of this documentation might later be reorganized in order to achieve a better flow.*
## Welcome to the Polygon Miden Documentation
Polygon Miden is a zk-optimized rollup with client-side proving. It is expected to launch a public testnet in Q3.

Unlike most other rollups, Polygon Miden prioritizes zk-friendliness over EVM compatibility. It also uses a novel, actor-based state model to exploit the full power of a zk-centric design. These design choices allow Polygon Miden to extend Ethereum’s feature set. These features allow developers to create applications currently difficult and impractical on EVM-like systems.
Expand Down
2 changes: 1 addition & 1 deletion docs/src/introduction/getting-started.md
Original file line number Diff line number Diff line change
@@ -1 +1 @@
[WIP]
> [WIP]
18 changes: 17 additions & 1 deletion docs/src/introduction/overview.md
Original file line number Diff line number Diff line change
@@ -1 +1,17 @@
[WIP]
# Overview
Polygon Miden, a ZK-optimized rollup with client-side proving, will complement Polygon’s set of zero-knowledge solutions aiming to become the internet's value layer.

With Polygon Miden, we aim to extend Ethereum's feature set. Ethereum is designed to be a base layer that evolves slowly and provides stability. Rollups allow the creation of new design spaces while retaining the security of Ethereum. This makes a rollup the perfect place to innovate and enable new functionality.

Unlike many other rollups, Polygon Miden prioritizes ZK-friendliness over EVM compatibility. It also uses a novel state model to exploit the full power of a ZK-centric design. These design decisions allow developers to create applications that are currently difficult or impractical to build on account-based systems.

We extend Ethereum on three core dimensions to attract billions of users: scalability, safety, and privacy.

From the architectural perspective, Polygon Miden uses a bi-directional token bridge and verifier contract to ensure computational integrity. Miden Nodes act as operators that keep the state and compress state transitions recursively into STARK-proofs. The token bridge on Ethereum verifies these proofs. Users can run Miden clients to send RPC requests to the Miden Nodes to update the state.

The major components of Polygon Miden are:

* Miden Clients - represent Miden users
* Miden Nodes - manage the Miden rollup and compress proofs
* Verifier Contract - keeps and verifies state on Ethereum [Not specified yet]
* Bridge Contract - entry and exit point for users [Not specified yet]

0 comments on commit 63549f0

Please sign in to comment.