Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SMT should use Option<T> #289

Open
hackaugusto opened this issue Mar 14, 2024 · 8 comments
Open

SMT should use Option<T> #289

hackaugusto opened this issue Mar 14, 2024 · 8 comments

Comments

@hackaugusto
Copy link
Contributor

The EMPTY_VALUE is leaking in the API. And the users of the API now have to special case that, sometimes converting to Option<T> themselves. It would make for a nicer API if Smt handled that directly.

@plafer
Copy link
Contributor

plafer commented Mar 14, 2024

We specifically chose not to return an Option<T> because semantically, an SMT has all its leaves populated (with EMPTY_VALUE unless updated). This is consistent with the root that we return for an empty tree. In other words, a return value of None would suggest that there is no value at a key, which is inconsistent with the tree root (which uses EMPTY_VALUE as a value associated with that key).

@hackaugusto
Copy link
Contributor Author

hackaugusto commented Mar 14, 2024

I disagree this is the semantics of the tree. It is true the sparse tree behaves as if the leaves existed, but that has to do with how the tree is encoded in memory. How the tree is encoded does not affect the interpretation of the leaf value. To my interpretation the value EMPTY_WORD is how we encode the absence of a value, i.e. None, and this would be true for a sparse and fully materialized tree.

The above works because of hash collision resistance. We can assume it is extremely unlikely the result of a hash be same as EMPTY_WORD, so we can reserve that value to mean something, in this case it would mean the absence of a value / None.

@bobbinth
Copy link
Contributor

It is true the sparse tree behaves as if the leaves existed, but that has to do with how the tree is encoded in memory. How the tree is encoded does not affect the interpretation of the leaf value. To my interpretation the value EMPTY_WORD is how we encode the absence of a value, i.e. None, and this would be true for a sparse and fully materialized tree.

I'm not sure I agree. Semantically, an empty SMT is identically to a fully balanced Merkle tree where every leaf is a "default value". In our case, the default value is EMPTY_WORD but we could have chosen something else as well. Basically, it is not that there is nothing in a given leaf: each leaf has a value. It is up to the users of the tree to define how they interpret a given value (i.e., they may decide that the default value means None or something else).

@hackaugusto
Copy link
Contributor Author

hackaugusto commented Mar 14, 2024

I agree that EMPTY_WORD can be interpreted differently, including a sparse representation of the default value, we are not disagreeing on that. I just didn't argue for this interpretation because AFAIK we have no need for it, but we do for the None case.

It is up to the users of the tree to define how they interpret a given value (i.e., they may decide that the default value means None or something else).

This is what I'm pointing out, "up to the user" means:

let smt = SimpleSmt::new();
let value = smt.get_leaf(leaf(4));

if value == SimpleSmt::EMPTY_VALUE { // <-- this is leaking
   // treat it as default or none
}

A better API would be to have:

let smt = SimpleSmt::new();
let value: Option<T> = smt.get_leaf(leaf(4)); // thumbstone represents `None`
                                              // `None` is constructed internally

and if we need, we can also have:

let smt = SimpleSmt::new();
let value: T = smt.get_leaf(leaf(4)); // thumbstone represents the default value
                                      // `T::default()` is called internally

we can probably support both use cases by making SimpleSmt generic over T: Default. None is the default value for Option<T> and would behave correctly.

@plafer
Copy link
Contributor

plafer commented Mar 14, 2024

How I'm reconciling both points of view is:

  • The current state of the API treats the internal BTreeMap of which leaves have non-default values as an implementation detail. The current Smt represents an abstract (very large) Merkle tree where every leaf's value is set to Smt::EMPTY_VALUE.
  • What Augusto is proposing is that the Smt would actually represent a sparse Merkle tree, such that the internal BTreeMap is not an implementation detail. We then return Option<T> to differentiate between which leaves have been set, and which haven't.

I agree with Augusto that returning Option<T> would be better, since indeed the miden-node does need the BTreeMap reified (i.e. it takes into account which leaves it has set). Also from a practical POV, returning Option<T> gives strictly more information: you not only know that the default leaf value will be used in the computation of the root hash, but you also know that that leaf wasn't explicitly set.

IMO then for this API to be clean, we'd need to:

  • insert(key, EMPTY_WORD) would set an entry in the BTreeMap with the value EMPTY_WORD (such that a subsequent insert(key, x) would return Some(EMPTY_WORD), indicating that the leaf was explicitly set in the internal BTreeMap)
  • Add an explicit remove(key) method that removes an entry from the BTreeMap

FWIW, Penumbra's Jellyfish merkle tree's get() also returns an Option.

@bobbinth
Copy link
Contributor

I think the complication here is that we need work with Smt in other contexts as well (e.g., in MASM). So, we always need to leak these details. For example, if we return Option, we still need to indicate somewhere what None maps to (i.e., does it map to an EMPTY_WORD or something else?). If we don't have this info, we can't compute the root of the Merkle tree, perform deletions etc. (I'm talking here specifically about MASM).

@plafer
Copy link
Contributor

plafer commented Mar 14, 2024

For example, if we return Option, we still need to indicate somewhere what None maps to (i.e., does it map to an EMPTY_WORD or something else?)

I would think of Smt and the MASM version as implementing a common specification (which we have only implicitly). That specification mandates that empty leaves map need to map EMPTY_WORD.

Hence, we can have Smt::insert() return Option, and document that a value of None maps to EMPTY_WORD when computing roots (as per the specification). Any implementation (Rust, MASM, or anything else) which doesn't do this violates the specification, and hence is incorrect.

@hackaugusto
Copy link
Contributor Author

hackaugusto commented Mar 15, 2024

@plafer I didn't want to add explicit presence tracking, I think we already have what we need. At least for how the sparse Merkle tree is used for the nullifiers, we have the following:

  1. a thumbstone value, called EMPTY_WORD
  2. a tree of fixed depth with the leaves initialized to the thumbstone value
  3. the sparse implementation doesn't encode the thumbstone value is memory, giving it sparsity
  4. the thumbstone has a natural interpretation, meaning "lack of a nullifier", i.e. the value None
  5. Given 2. and 4. the tree is initialized with None in its leaves

The point I'm trying to make is that the interpretation of the thumbstone value EMPTY_WORD for the nullifier tree has the meaning of lack of a value, and encoding that in the API will make it easier to use.

Edit: To clarify the above, because I'm not 100% if I'm misinterpreting what you said. The main difference is that the api would be remove(key) or insert(key, None) instead of insert(key, EMPTY_WORD), and the API would never return Some(EMPTY_WORD) but None instead.


This bit of code should satisfy the things brought up in this discussion:

  • Support a Option<T> interpretation of the thumbstone value as I wanted
  • Support a Default interpretation of the thumbstone value as bobbin wanted
  • Support hashing of the values to communicate with the VM, as needed and pointed by bobbin
  • Support the tracking as Phillippe pointed out
/// Represents objects that can be hashed
trait Hashable {
  fn hash(&self) -> Digest;
}

/// Interface of a Merklelized map.
trait MerkleMap<K: Hashable, V: Hashable>
{
  type Error;
  type MerklePath;

  fn root(&self) -> Digest;
  fn insert(&mut self, key: K, value: V) -> Option<V>;
  fn remove(&mut self, key: impl AsRef<K>) -> Result<V, Error>;
  fn lookup(&self, key: impl AsRef<K>) -> Result<&V, Error>;
  fn open(&self, key: impl AsRef<K>) -> Result<MerklePath, Error>;
}

/// A sparse implementation of a Merkle map.
///
/// This structure uses the default value of `hash(V::default())` as a thumbstone,
/// these entries are not stored in memory, saving space.
struct SparseMerkleMap { ... }

impl<K, V> MerkleTree<K, V> for SparseMerkleTree
  where
    K: Hashable,
    V: Default + Hashable,
{
   type Error = Infallible;
   ...
}

impl Hashable for Option<T: Hashable> {
  fn hash(&self) -> Digest {
    return EMPTY_WORD;
  }
}

#[test]
fn usage() {
  // support default values as mentioned by bobbin
  let mut tree = SparseMerkleTree<Felt, Felt>::new();
  assert_eq!(tree.get(ONE), Ok(ZERO));           // sparse tree prepopulated with the default
  assert_eq!(tree.insert(ONE, ONE), Some(ZERO)); // same

  // supports None as mentioned by me
  let mut tree = SparseMerkleTree<Felt, Option<Felt>>::new();
  assert_eq!(tree.get(ONE), Ok(None));       // the default is `None`
  assert_eq!(tree.insert(ONE), Some(None));  // double wrapping :(
}

The methods are fallible in the trait, but infallible in the example above by setting the Error to (). The fallible definition allows for another tree to implement the trait and have presence tracking, it would return an error NotPresent on lookup/remove/open, and None on insert.

I'm not sure if it is worth it though, unless we think this trait would be also used by users down the road. Otherwise I would use a simpler version, and if we really need add the tracking version later on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants