Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternatives to TSMT #1136

Closed
bobbinth opened this issue Nov 3, 2023 · 6 comments
Closed

Alternatives to TSMT #1136

bobbinth opened this issue Nov 3, 2023 · 6 comments
Assignees
Milestone

Comments

@bobbinth
Copy link
Contributor

bobbinth commented Nov 3, 2023

Our Tiered Sparse Merkle tree is very efficient but also is very complex - much more complex than I would like it to be (both the Rust and the MASM implementations). So, it might be good to consider alternatives, which may be not as efficient but are dramatically simpler.

First, to set the context about efficiency, inside the VM we care about two types of cycles: (1) cycles on the stack, and (2) cycles in the hasher chiplet. TSMT has roughly the following metrics (depth 64 is TBD because it is not yet implemented in MASM):

Number of items Stack cycles (read) Hasher cycles (read) Stack cycles (write) Hasher cycles (write)
< $2^{14}$ ~100 ~128 100 - 240 ~256
< $2^{30}$ ~100 ~256 135 - 265 ~512
< $2^{46}$ ~100 ~384 140 - 255 ~768
> $2^{46}$ TBD TBD TBD TBD

As can be seen from the above, TSMT reads take up just around 100 stack cycles, while writes are between 1.3x and 2.5x more expensive (the variability is due to the type of update). The number of hasher cycles grows with the tree depth since we need to verify longer Merkle paths.

Second, the main purpose of TSMT is to provide a way to simulate a key-value map with 4-element keys. So, whatever new construction we come up with, needs to support this.

The simplest alternative

Perhaps the simplest alternative is to reduce the number of tiers in TSMT to just one (at depth 64). This would make it almost equivalent to the SimpleSmt. The tree would work as follows:

At depth 64, we would have 3 types of leaves:

  • Empty leaf (i.e., ZERO).
  • A leaf with a single key-value pair.
  • A leaf with multiple key-value pairs sorted by key.

Both Rust and MASM implementations of these should be very straight-forward. In terms of efficiency in the VM, it would look something like this:

Number of items Stack cycles (read) Hasher cycles (read) Stack cycles (write) Hasher cycles (write)
< $2^{14}$ <50 ~512 <100 ~1024
< $2^{30}$ <50 ~512 <100 ~1024
< $2^{46}$ <50 ~512 <100 ~1024
> $2^{46}$ TBD TBD TBD TBD

As can be seen from the above, number of stack cycles would actually be smaller as compared to our current TSMT, but the number of hasher cycles would increase.

In the context of Miden rollup, we care about TSMT in several contexts:

  • Account / nullifier DBs.
  • Account storage.
  • Account vault.

For account nullifier DBs, we expect that the trees would be loaded with many values (i.e., > 100M) and so this simpler approach would roughly double the number of cycles in the hasher.

For account storage, it is difficult to tell since there are other options which will probably be more widely used. But overall, worse performance here may not be too big of a deal.

The biggest issue though is the account vault. The main reason is that account vaults will likely contain relatively few items (e.g., less than 10K), and so the simpler approach is close to 4x worse than the current approach. To make matters worse, we update the vault 3 times during transaction execution - once in prologue, once during note/script processing, and finally during the epilogue.

So, if a transaction was consuming notes with about 100 assets, account vault manipulations alone would consume over 300K cycles (vs. about 75K cycles with the current approach). This is a substantial difference - but maybe it is still worth the simplicity.

Alternative with 2 tiers

We could also reduce the number of tiers to 2 - at depth 32 and 64 (or 16 and 64). This would not be as simple as the previous approach, but also, would be much more efficient for trees containing a small number of values.

@bobbinth bobbinth changed the title Alternative to TSMT Alternatives to TSMT Nov 3, 2023
@Dominik1999
Copy link
Contributor

Do you think it makes sense to wait for that until we have a first testnet?

@plafer
Copy link
Contributor

plafer commented Nov 6, 2023

My 2 cents re: complexity. I will not touch on whether applying these suggestions would make using TSMT still useful.

I don't think the data structure is fundamentally too complex; i.e. it is relatively simple to explain, and understand how it works at a high level. I believe there are 2 low-hanging fruits to reduce the accidental complexity of the current implementations:

  1. Refactor the Rust code
  • There are certain patterns that are unintuitive and could lead to bugs, that could simply be refactor out. For example, the ValueStore::get_first(prefix) method is typically used in conjunction with the leaf_exists output of NodeStore::get_proof(), such as here. get_first(prefix) in theory could return an entry from another node, but if we get information from elsewhere that a leaf exists at index, then both methods in conjunction end up giving the key/value stored at some leaf. I would vote for a refactor where there's a single method along the lines of get_key_value_at_leaf(leaf_index) to get that information.
  • Define more newtypes (across the codebase) instead of using RpoDigest for key, account hash, and many more that I currently can't think of off the top of my head.
  • Better naming, as captured in Naming change for NodeIndex crypto#208
  1. Properly document the behavior and non-deterministic inputs of the MASM stdlib functions
  • Specifically, it would be good to not only know how the advice map and Merkle store need to be loaded, but provide an explanation of how these are used internally.

@frisitano
Copy link
Contributor

frisitano commented Nov 7, 2023

The simplest alternative

Perhaps the simplest alternative is to reduce the number of tiers in TSMT to just one (at depth 64). This would make it almost equivalent to the [SimpleSmt]

Perhaps the depth of the single tier TSMT could be parameterised. The depth of the TSMT would be use case specific i.e. for account vaults we would use a single tier TSMT of depth 16 (or maybe even less), for account and nullifier db's we would have a TSMT of depth 64. For account storage it would be the responsibility of the user to specify the depth of the TSMT as part of the type information.

Alternative with 2 tiers

We could also reduce the number of tiers to 2 - at depth 32 and 64 (or 16 and 64). This would not be as simple as the previous approach, but also, would be much more efficient for trees containing a small number of values.

Similarly we could paramaterise the depth of the upper tier in the 2 tier construction.

I'm in favour of a single tier TSMT with parameterisable depth as through the depth configurability it gives a good balance between performance and simplicity for many use cases.

@bobbinth
Copy link
Contributor Author

bobbinth commented Nov 8, 2023

Perhaps the depth of the single tier TSMT could be parameterised.

Setting the depth much smaller than 64 could become a significant attack vector (at least if we keep colliding keys in a sorted list). The main factor is the cost of generating key collisions. I estimate that with RPO, finding a 64-bit pre-image probably costs many million dollars (probably north of $100M) and so the cost of generating more than a few hundred keys with the same 64-bits prefix as some existing key is prohibitively expensive - even in the medium term.

16-bit (and even 32-bit) pre-images are pretty easy to find though. So, this opens up various attack vectors (e.g., someone could make accessing a specific key in account storage prohibitively expensive). I'm sure these can be worked around, but it does require users to be aware of these edge-case behaviors and is probably undesirable.

@bobbinth
Copy link
Contributor Author

bobbinth commented Jan 11, 2024

Once 0xPolygonMiden/crypto#243 is implemented, we should update stdlib to replace current std::collections::smt with this new implementation. We can also remove std::collections::smt64 as we can just use VM instructions directly.

This will also close #1050 and #1057.

Updating std::collections::smt64 will involve a few things:

  1. We need to replace the SMT code in stdlib with the one suited for the new implementation of the SMT. The new code should be dramatically simpler. We probably want to split this into two parts (and maybe even have different PRs to address each):
    a. Handling single-value leaves: this shouldn't much more complicated than using our existing mtree_get and mtree_set instructions together with hmerge (or hperm instruction) to compute leaf node values from key-value paris.
    b. Handling multi-value leaves: this would be similar to above, but we'd also need to use a sorting algorithm during insertions. We could optimize sorting using non-determinism - but I'm not sure it is worth doing yet.
  2. We'll need to update handling of decorators related to the SMT module. These are smt_get, smt_set, and smt_peek. The current code for them is here. The new code should be dramatically simpler. I haven't thought this through yet, but they may be as simple as pushing a few flags onto the advice stack to specify whether we are dealing with single-value or multi-value leaf case.

@bobbinth
Copy link
Contributor Author

bobbinth commented Feb 7, 2024

Closed by #1215.

@bobbinth bobbinth closed this as completed Feb 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

4 participants