- This implementation is gas optimized according to a ethresear.ch post from Vitalik Buterin and others.
- It's dapptools-compatible and so all tests are written in Solidity: Maintenance is improved.
- You need to have dapptools installed.
dapp install attestate/indexed-sparse-merkle-tree
Once you've done that, you can use the StateTree.sol
contract class
StateTree
to initialize the merkle tree.
Here's an example of how StateTree.sol
could be used. Here,
StateTree.empty()
computes the default empty sparse merkle tree. That root is
then stored in a bytes32
storage variable when Dapp
is initialized.
pragma solidity ^0.8.6;
import "indexed-sparse-merkle-tree/StateTree.sol";
contract Dapp {
bytes32 public root;
constructor() {
root = StateTree.empty();
}
}
Sparse Merkle Trees (short: SMT) are regular merkle trees (or sometimes called "hash trees"). For brevity, we're not going to go into details about how merkle trees work, but only what the "sparse" attribute means.
Generally speaking, a merkle tree has, as a base layer, a number of leaves that are being hashed. But the core assumption of a merkle tree is that most or all leaves on the base layer have data in them. Here's a classical example of a binary hash tree.
Now for sparse merkle tree, the default assumption is that all leaves at the
tree's birth are empty. And this empty value is usually represented by some
kind of value. In programming languages terms like void
, null
, nil
are
often used to represent emptiness. However, since we want to hash all leaf
values using a hash function like e.g. sha3, we'll have to somehow represent
emptiness nominally. Hence, when calling StateTree.empty()
, the tree's
initializing function, we generate a list of leaves all empty and represented
by zeros:
leaves = [0, 0, ...,0];
And we then hash those leaves exactly the same way we'd hash a regular merkle tree.
A merkle tree is great for validating the integrity of an existing file. E.g. in Bittorrent, when someone posts a magnet link to a movie, then knowing the movie file's hash is useful as it helps to verify the integrity of the movie currently downloaded.
But e.g. in the case of storing data structures on-chain as is done with decentralized apps, a sparse merkle tree that is initially empty but can be updated to contain more and more values, is kinda neat.
E.g. after we initialized our tree with StateTree.empty()
, we can then write
values into any of the leaves by calling StateTree.write()
to receive a new
root. An example:
bytes32[] memory proofs = new bytes32[](0);
bytes32 OLD_LEAF_HASH = keccak256(
abi.encode(
0x0000000000000000000000000000000000000000000000000000000000000000
)
);
bytes32 NEW_LEAF_HASH = keccak256(
abi.encode(
0x0000000000000000000000000000000000000000000000000000000000000001
)
);
uint8 bits = 0;
uint256 index = 0;
bytes32 ROOT0 = StateTree.empty();
bytes32 ROOT1 = StateTree.write(
proofs,
bits,
index,
NEW_LEAF_HASH,
OLD_LEAF_HASH,
ROOT0
);
See, in the above example, we proposed overwriting the old value of the tree
OLD_LEAF_HASH
with a new value NEW_LEAF_HASH
. We said that in the tree
itself we want to write the leaf at index = 0
. There's also two variables not
being used yet: proofs
and bits
.
Normally, for each insertion to have the write
function succeed, we have to
provide a valid old merkle root hash. This is because internally, write
first
checks with validate
if by reinserting OLD_LEAF_HASH
we get the same old
root as we claimed with ROOT0
.
Only if that's the case, we're allowed to proceed to actually attempt
computation of the new root ROOT1
.
In smart contract programming, originally (sparse) merkle trees were used to validate integrity. But with the rise of the rollup infrastructures, it's become known that SMTs might also allow to build linearly scalable dapps by saving gas costs when storing words in the EVM's state.
Storing a non-zero word of 32 bytes for the first time in EVM storage currently costs 20k gas. However, including many 32bytes leaves into an SMT might cost less than 20k gas per 32bytes. A user or operator would still have to pay for the tree's root inclusion into the chain, e.g. by storing it as a storage variable - but even for large body of data, cost would decrease. Generally speaking, most cost would occur when having to compute the new root.
A special property of this sparse merkle tree is that it is gas-optimized according to a post from ethresear.ch. In general, a sparse merkle tree of binary structure has a mandatory proof size of its depth. In simpler terms, this means that for an SMT with depth eight, for any type of recomputation eight other nodes on various levels would have to be necessary. Actually, there's a really great article by Vitalik called "Merkling in Ethereum that may help to understand this property.
But in any case, at e.g. a depth of 256 and having 2^256 leaves, requiring 256 proofs of 32 bytes ends up being a lot of data; that doesn't sound gas efficient at all. And that were Vitalik's post on "Optimizing sparse Merkle trees" comes in handy. Because, if we assume that most of our tree will mostly remain relatively empty, then we can precompute the leaves that just represent zero values on all sorts of levels.
For that, e.g. imagine a sparse merkle tree with just a single non-zero leaf at the left hand side. If we looked at it, we'd immediately see that all other leaf nodes are zero hashes. We'd also understand that even for the most top right node, closest to the root node, it could be precomputed too and would simply be the accumulation of all zero node hashes on the last seven levels.
Indeed, the StateTree
has all these values available precomputed in the
get(uint256 level)
function. And that's how we can safe gas costs, by
reducing the number of proofs required to update the tree. Going back to our
example earlier:
bytes32[] memory proofs = new bytes32[](0);
bytes32 OLD_LEAF_HASH = keccak256(
abi.encode(
0x0000000000000000000000000000000000000000000000000000000000000000
)
);
bytes32 NEW_LEAF_HASH = keccak256(
abi.encode(
0x0000000000000000000000000000000000000000000000000000000000000001
)
);
uint8 bits = 0;
uint256 index = 0;
bytes32 ROOT0 = StateTree.empty();
bytes32 ROOT1 = StateTree.write(
proofs,
bits,
index,
NEW_LEAF_HASH,
OLD_LEAF_HASH,
ROOT0
);
We can see that proofs
and bits
is actually empty, as all zero-value hashes
can be used from within the actual implementation. So not a single proof has to
be put into calldata.
If, however, we were to now enter a second value at another index into the tree
we'd have to start getting creative. See, bits
indicates at which level we
have to send in a custom proof and can't use a pre-computed zero proof.
// continued from code snippet above...
uint256 proofIndex = 0;
uint8 bytePos = (bufLength - 1) - (proofIndex / 8);
uint8 bits2 = bytePos + 1 << (proofIndex % 8);
bytes32[] memory proofs2 = new bytes32[](1);
proofs2[0] = NEW_LEAF_HASH;
uint256 index2 = 1;
bytes32 ROOT2 = StateTree.write(
proofs2,
bits2,
index2,
NEW_LEAF_HASH,
OLD_LEAF_HASH,
ROOT1
);
So in this case, we're hence making the write
function understand that at
proof2[0]
, there's a non-zero proof for the computation. We've done so by
computing a bit map on a uint8 bits2
value. This looks like some rather
intimidating math, but what it really does is it flips the first zero value of
an integer represented in binary to a one. E.g. 0000
to 1000
.
For the user's convenience however, we're shipping this bitmap function in the
code too. Just call bitmap(uint256 index)
.
You can download this repository and test the implementation using the following command:
dapp test -vv
See LICENSE file.