-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New crate for compact trie proofs. #30
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went over a first reading of this new crate, I couldn't really dig into the algo implementation.
General remark/question from this first reading:
-
is separation of proof construction/proof removal and proof reconstruction/proof check needed?
Proof reconstruction or removal looks a lot like what an inline check/construction for multiple value will be. If you manage to reconstruct missing hash, you basically prove your query (just need to check for values).
But if we consider other case of proof (like calling a substrate function), we do not have the key values as input, at the same time the algo could not apply (we do not know which key/values are accessed), so it will be a better fit to have things done internally in the recorder, then for reconstruction it is not simple either, probably could overload triedb to check hash on next query not being a parent child: as the unordered nature suggest we will need to put all in memory (but already have) and at the end check the number of reconstructed hash. Plus it will need an index stored for ommited node (cannot say it is next value or as current implementation previous value).
So having this specialized prover is probably a good idea (it can be fastest as it only need a stack in memory). Still would be a bit more cooler if we run O(d). -
do we really need a new codec trait? same thing for structure? couldn't we use composition somehow?
-
not sure how having a specialized sort is needed here?
Regarding testing, it is probably super easy to fuzz it against the existing non compact implementation (there is already fuzzing for trie building, just need to extend to proof of all elements in the trie).
Otherwhise the test to extend is clearly 'triedbmut.rs', 'playpen', that is the one that is the more likely to find bugs.
// Look up all keys in order and record the nodes traversed during lookups. | ||
// | ||
// Ideally, we would only store the recorded nodes for one key at a time in memory, making the | ||
// memory requirements O(d) where d is the depth of the tree. However, borrowck makes this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// memory requirements O(d) where d is the depth of the tree. However, borrowck makes this | |
// memory requirements O(d) where d is the depth of the tree. However, borrowing makes this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or does 'borrowck' means borrow checker?
// difficult, so instead we store all recorded nodes for all keys in memory, making the memory | ||
// requirements O(d * k), where k is the number of keys. | ||
let mut recorder = Recorder::new(); | ||
let values = keys.iter() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that multiple get could be enhanced greatly with a dedicated api (especially since we sorted the keys).
But the previous comment seems to say that, and it is just an optimization.
use crate::std::{fmt, cmp::{self, Ordering}}; | ||
|
||
/// A representation of a nibble slice which is left-aligned. The regular `trie_db::NibbleSlice` is | ||
/// right-aligned meaning it does not support efficient truncation from the right side. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see this being use on key iter but key iter is only use after a sort where the keys are cloned (think we can keys.into_iter and run on owned values could cost a bit more on truncate but I guess with those small vec it will keep its starting capacity).
So maybe NibbleVec can be fine to use (it is left aligned, I remember writting the slice version too but dropping it after seeing I only ever use on a newly allocated vec).
If this is required, I would suggest putting it in trie-db crate in nibble module (making it more obvious the implementation exists).
// guaranteed that until the end of the loop body that all stack entries will have a key | ||
// that is a prefix of the current key. | ||
while let Some(entry) = node_stack.pop() { | ||
if key_nibbles.starts_with(&entry.key) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we stacking the parent trie node, wouldn't it be possible to use the standard key sort and just get those values with the iteration, stacking the to be build entries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By reading a bit further it seems that we use it to have a logic of going up to the root in the trie, being able to switch branch right where the next key will start (in fact after skipping redundant node). This can also be calculated by comparing both keys.
} else { | ||
Step::FoundValue(None) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | |
}, |
I am a fan of using the coma here, I think it is not something in the coding guideline though.
proof_nodes.push(encode_proof_node::<C, L::Hash>(&proof_node)); | ||
} | ||
|
||
Ok(Proof { nodes: proof_nodes }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrong indent.
K: 'a + AsRef<[u8]>, | ||
V: 'a + AsRef<[u8]>, | ||
{ | ||
/// Sort items in post-order traversal order by key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrong indent.
Edit: there seems to be a few more afterward.
ProofNode::Empty => L::Codec::empty_node().to_vec(), | ||
ProofNode::Leaf { partial_key } => { | ||
let (_, value) = items_iter | ||
.find(|(_key, value)| value.is_some()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kinda stop following the algo (will go into it later and just doing simple reading), but is it expected to use a find here? proof check could be driven by item keys and run without a read ahead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok that is the reconstruction step, so I understand a bit more, still there should be a way to use the reconstruction as a proof check, it feels like we do the job twice here : reconstruction then proof check.
Still feel like a read ahead is not needed.
} | ||
|
||
/// Trait for proof node encoding/decoding. | ||
pub trait ProofNodeCodec<H>: Sized |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we limit that to function where the prototype changes from existing codec trait?
(implementation could be rather straight forward for those if you go the hacky way by storing only a 2 byte index of omitted values and appending it to the existing codec output, then internal node representation would need to encapsulate the former one with a simple fix sized array index of omitted node).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally we need to avoid codec code redundancy (in substrate at least, for trie the reference infamous circular dependancy makes it awkward).
That is a TODO but unrelated with this PR (already exists in trie repo a bit and between trie and substrate to).
has_children.iter().cloned(), | ||
&mut output[offset..(offset + BITMAP_LENGTH)] | ||
); | ||
Bitmap::encode( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok moving this at the end would look a bit hacky, but since proof is all in memory it is doable if we want to reuse other codec code.
I won't have time to review this, sorry. |
As a side note, another useful primitive would be a compaction from a memory-db/hashmap:
In this case the processing probably needs two steps because we cannot change the order of accessed content during execution. I am thinking of cases like cumulus where the proof/witness correspond to a full block processing and can be rather big (light client call proof also). |
Superseded by #45. |
The Merkle-Patrica trie is designed to support compact proofs of key-value lookups (otherwise it would not be merklized). Currently, the proof generation and verification is implemented downstream in Substrate. However, the proofs are inefficient for reasons described in paritytech/substrate#3782.
This PR creates a new crate for generating and verifying compact proofs compatible with the existing trie structure. These proofs are more space efficient than those currently used in Substrate as they omit the value data that the verifier may already have access to and omit hashes of other trie nodes in the proof which must be computed by the verifier anyway. In practical terms, the space savings from omitting internal hashes is that one hash is omitted for each node in the proof and all values that the verifier is checking are omitted. The overhead for signaling that values and child hashes are omitted is 0-1 bytes per omitted hash or value depending on context.
This PR is a draft because:
&'static str
. There should be real error types.reference-trie
crate due to issues with circular dependencies. However, there is pretty considerable replication of logic or outright duplication in the reference codec impl with the reference codec impl for regularNodeCodec
s. I noticed that much of this code is duplicated betweenreference-trie
andsubstrate-trie
as well.UPDATE: After looking further into integration in Substrate, it really does make more sense for proofs to carry values for light clients. It would take significant refactoring to be compatible with this sort of design. Execution proofs would be especially tricky as they are just provided a trie storage built from the proof.
It might make sense, however, to present an API for a value-carrying proof which still omits internal hashes. The savings then are just from the omitted hashes and not omitted values. The crate could present both types of proofs.
This sort of proof could be useful in more specialized cases like the bridge where proof size is even more important.