New crate for compact trie proofs. #30

jimpo · 2019-10-16T10:11:22Z

The Merkle-Patrica trie is designed to support compact proofs of key-value lookups (otherwise it would not be merklized). Currently, the proof generation and verification is implemented downstream in Substrate. However, the proofs are inefficient for reasons described in paritytech/substrate#3782.

This PR creates a new crate for generating and verifying compact proofs compatible with the existing trie structure. These proofs are more space efficient than those currently used in Substrate as they omit the value data that the verifier may already have access to and omit hashes of other trie nodes in the proof which must be computed by the verifier anyway. In practical terms, the space savings from omitting internal hashes is that one hash is omitted for each node in the proof and all values that the verifier is checking are omitted. The overhead for signaling that values and child hashes are omitted is 0-1 bytes per omitted hash or value depending on context.

This PR is a draft because:

I could use some general feedback on the API and amount of duplication.
We should decide whether this is worth the additional complexity. (I think so, but would like others to weigh in).
This needs MANY more tests and better documentation.
The errors are all &'static str. There should be real error types.
I'm not sure if the reference ProofNodeCodec is in the right place. It is not in the reference-trie crate due to issues with circular dependencies. However, there is pretty considerable replication of logic or outright duplication in the reference codec impl with the reference codec impl for regular NodeCodecs. I noticed that much of this code is duplicated between reference-trie and substrate-trie as well.

UPDATE: After looking further into integration in Substrate, it really does make more sense for proofs to carry values for light clients. It would take significant refactoring to be compatible with this sort of design. Execution proofs would be especially tricky as they are just provided a trie storage built from the proof.

It might make sense, however, to present an API for a value-carrying proof which still omits internal hashes. The savings then are just from the omitted hashes and not omitted values. The crate could present both types of proofs.

This sort of proof could be useful in more specialized cases like the bridge where proof size is even more important.

cheme

I went over a first reading of this new crate, I couldn't really dig into the algo implementation.
General remark/question from this first reading:

is separation of proof construction/proof removal and proof reconstruction/proof check needed?
Proof reconstruction or removal looks a lot like what an inline check/construction for multiple value will be. If you manage to reconstruct missing hash, you basically prove your query (just need to check for values).
But if we consider other case of proof (like calling a substrate function), we do not have the key values as input, at the same time the algo could not apply (we do not know which key/values are accessed), so it will be a better fit to have things done internally in the recorder, then for reconstruction it is not simple either, probably could overload triedb to check hash on next query not being a parent child: as the unordered nature suggest we will need to put all in memory (but already have) and at the end check the number of reconstructed hash. Plus it will need an index stored for ommited node (cannot say it is next value or as current implementation previous value).
So having this specialized prover is probably a good idea (it can be fastest as it only need a stack in memory). Still would be a bit more cooler if we run O(d).
do we really need a new codec trait? same thing for structure? couldn't we use composition somehow?
not sure how having a specialized sort is needed here?

Regarding testing, it is probably super easy to fuzz it against the existing non compact implementation (there is already fuzzing for trie building, just need to extend to proof of all elements in the trie).
Otherwhise the test to extend is clearly 'triedbmut.rs', 'playpen', that is the one that is the more likely to find bugs.

cheme · 2019-10-16T12:37:39Z

trie-proof/src/lib.rs

+    // Look up all keys in order and record the nodes traversed during lookups.
+    //
+    // Ideally, we would only store the recorded nodes for one key at a time in memory, making the
+    // memory requirements O(d) where d is the depth of the tree. However, borrowck makes this


Suggested change

// memory requirements O(d) where d is the depth of the tree. However, borrowck makes this

// memory requirements O(d) where d is the depth of the tree. However, borrowing makes this

or does 'borrowck' means borrow checker?

cheme · 2019-10-16T12:40:23Z

trie-proof/src/lib.rs

+    // difficult, so instead we store all recorded nodes for all keys in memory, making the memory
+    // requirements O(d * k), where k is the number of keys.
+    let mut recorder = Recorder::new();
+    let values = keys.iter()


Note that multiple get could be enhanced greatly with a dedicated api (especially since we sorted the keys).
But the previous comment seems to say that, and it is just an optimization.

cheme · 2019-10-16T13:44:07Z

trie-proof/src/util.rs

+use crate::std::{fmt, cmp::{self, Ordering}};
+
+/// A representation of a nibble slice which is left-aligned. The regular `trie_db::NibbleSlice` is
+/// right-aligned meaning it does not support efficient truncation from the right side.


I see this being use on key iter but key iter is only use after a sort where the keys are cloned (think we can keys.into_iter and run on owned values could cost a bit more on truncate but I guess with those small vec it will keep its starting capacity).
So maybe NibbleVec can be fine to use (it is left aligned, I remember writting the slice version too but dropping it after seeing I only ever use on a newly allocated vec).
If this is required, I would suggest putting it in trie-db crate in nibble module (making it more obvious the implementation exists).

cheme · 2019-10-16T13:59:34Z

trie-proof/src/lib.rs

+        // guaranteed that until the end of the loop body that all stack entries will have a key
+        // that is a prefix of the current key.
+        while let Some(entry) = node_stack.pop() {
+            if key_nibbles.starts_with(&entry.key) {


are we stacking the parent trie node, wouldn't it be possible to use the standard key sort and just get those values with the iteration, stacking the to be build entries.

By reading a bit further it seems that we use it to have a logic of going up to the root in the trie, being able to switch branch right where the next key will start (in fact after skipping redundant node). This can also be calculated by comparing both keys.

cheme · 2019-10-16T14:07:09Z

trie-proof/src/lib.rs

+                        } else {
+                            Step::FoundValue(None)
+                        }
+                    }


Suggested change

}

},

I am a fan of using the coma here, I think it is not something in the coding guideline though.

cheme · 2019-10-16T14:22:13Z

trie-proof/src/lib.rs

+        proof_nodes.push(encode_proof_node::<C, L::Hash>(&proof_node));
+    }
+
+	Ok(Proof { nodes: proof_nodes })


wrong indent.

cheme · 2019-10-16T14:23:11Z

trie-proof/src/lib.rs

+        K: 'a + AsRef<[u8]>,
+        V: 'a + AsRef<[u8]>,
+{
+	/// Sort items in post-order traversal order by key.


wrong indent.
Edit: there seems to be a few more afterward.

cheme · 2019-10-16T14:29:07Z

trie-proof/src/lib.rs

+            ProofNode::Empty => L::Codec::empty_node().to_vec(),
+            ProofNode::Leaf { partial_key } => {
+                let (_, value) = items_iter
+                    .find(|(_key, value)| value.is_some())


I kinda stop following the algo (will go into it later and just doing simple reading), but is it expected to use a find here? proof check could be driven by item keys and run without a read ahead.

Ok that is the reconstruction step, so I understand a bit more, still there should be a way to use the reconstruction as a proof check, it feels like we do the job twice here : reconstruction then proof check.
Still feel like a read ahead is not needed.

cheme · 2019-10-16T14:48:01Z

trie-proof/src/node.rs

+}
+
+/// Trait for proof node encoding/decoding.
+pub trait ProofNodeCodec<H>: Sized


should we limit that to function where the prototype changes from existing codec trait?
(implementation could be rather straight forward for those if you go the hacky way by storing only a 2 byte index of omitted values and appending it to the existing codec output, then internal node representation would need to encapsulate the former one with a simple fix sized array index of omitted node).

Generally we need to avoid codec code redundancy (in substrate at least, for trie the reference infamous circular dependancy makes it awkward).
That is a TODO but unrelated with this PR (already exists in trie repo a bit and between trie and substrate to).

cheme · 2019-10-16T14:51:23Z

trie-proof/src/reference_codec/util.rs

+        has_children.iter().cloned(),
+        &mut output[offset..(offset + BITMAP_LENGTH)]
+    );
+    Bitmap::encode(


Ok moving this at the end would look a bit hacky, but since proof is all in memory it is doable if we want to reuse other codec code.

rphmeier · 2019-10-17T20:12:21Z

I won't have time to review this, sorry.

cheme · 2019-10-24T08:20:00Z

As a side note, another useful primitive would be a compaction from a memory-db/hashmap:
for proof that are not about just key query.

first collect the proof in a memorydb/map
then iterate from the root to remove 'internal hash' and order nodes correctly.
removing terminal values should be optional (not default in this use case).

In this case the processing probably needs two steps because we cannot change the order of accessed content during execution.

I am thinking of cases like cumulus where the proof/witness correspond to a full block processing and can be rather big (light client call proof also).

jimpo · 2019-12-13T12:38:27Z

Superseded by #45.

jimpo requested review from cheme and rphmeier October 16, 2019 10:11

jimpo force-pushed the trie-proof branch from dbaceb7 to f06f673 Compare October 16, 2019 10:23

New crate for compact trie proofs.

764db11

jimpo force-pushed the trie-proof branch from f06f673 to 764db11 Compare October 16, 2019 10:59

cheme reviewed Oct 16, 2019

View reviewed changes

jimpo closed this Dec 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New crate for compact trie proofs. #30

New crate for compact trie proofs. #30

jimpo commented Oct 16, 2019 •

edited

Loading

cheme left a comment

cheme Oct 16, 2019

cheme Oct 16, 2019

cheme Oct 16, 2019

cheme Oct 16, 2019

cheme Oct 16, 2019

cheme Oct 16, 2019

cheme Oct 16, 2019

cheme Oct 16, 2019

cheme Oct 16, 2019

cheme Oct 16, 2019

cheme Oct 16, 2019

cheme Oct 16, 2019

cheme Oct 16, 2019

cheme Oct 16, 2019

rphmeier commented Oct 17, 2019

cheme commented Oct 24, 2019 •

edited

Loading

jimpo commented Dec 13, 2019

	// memory requirements O(d) where d is the depth of the tree. However, borrowck makes this
	// memory requirements O(d) where d is the depth of the tree. However, borrowing makes this

New crate for compact trie proofs. #30

New crate for compact trie proofs. #30

Conversation

jimpo commented Oct 16, 2019 • edited Loading

cheme left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rphmeier commented Oct 17, 2019

cheme commented Oct 24, 2019 • edited Loading

jimpo commented Dec 13, 2019

jimpo commented Oct 16, 2019 •

edited

Loading

cheme commented Oct 24, 2019 •

edited

Loading