-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion: Batched Anchor Data Structure #69
Comments
@oed, can you elaborate on what exactly your question is about building the merkle tree when there are different numbers of leaf nodes? Beyond that, I'd like to see a bit more about what exactly the entries in the bloom filter look like/how they are built. Your code example actually has that part commented out and unspecified. Are you imagining that the tags, schema, controllers, and docId all get encoded into one entry somehow? Or does one ceramic document create like 4-10 different entries into the bloom filter? If the latter, do you plan to add anything into how these entries are encoded to allow distinguishing between entries from the different input types (distinguishing an entry that is a CID based on whether it was a main docID, a schema reference docId, or a tag, for example). Lastly, I'm a little confused as to how one would actually take advantage of the sortedness of the merkle tree, given that when you're at an intermediate node in the tree, you don't actually have any information encoded in that node about what portion of the sorted keyspace it represents, so you'd have no way to actually decide which branch of the tree to follow to get closer to the set of documents you are looking for, unless I'm missing something? |
Basically if the number of nodes is not a power of 2 then the tree will not be symmetrical. We should however always form the tree in a deterministic way, so which algorithm should we use for that? cc @simonovic86
It's an array of strings, updating the first post. Each document will create multiple entries in the filter.
Good point! We need a way to distinguish between DocIDs and Schema-DocIDs at the very least!
Yeah since there are more than one tag this is not optimal I think. Basically if you know that a tag is the first tag of a document then you can do a binary search to effectively find the update. However, if you only know some other info you can't really take much advantage of this. Might be better options here so open to suggestions! |
Yeah, the algorithm for building a balanced binary search tree is pretty standard and straightforward (I used to ask it as an interview question :P), I believe the only part of it that could be different implementation by implementation is when you have an even number of nodes whether you take the rightmost node in the left half as the root, or the leftmost node of the right half. Example code here. As long as we're consistent though it should be fine either way I'd think. @oed I see you updated the description above to include prepending "tag-" and "schema-" to those bloom filter records - I'd suggest we do the same for the other entries too ("controller-did-" and "DocId-") to future-proof ourselves in case we want to add additional DIDs or DocIds into the bloom filter in the future. Adding characters to the size of the entries in the bloom filter is free, since the actual entry strings never make it into the filter, just the hashes of them do. My concern about doing binary search on the tree is about more than having access to the right information of the document you're looking up. Even if you have all the fields of the document you are looking for, I'm still not sure how you could actually use that to search in the tree. The problem is that at the non-leaf nodes of the tree, you only have the CIDs of your left and right sub-tree nodes, the tree itself has no information about the contents of the documents covered by each subtree. To use the sortedness property of the tree, the non-leaf nodes would need to also contain information about which range of the overall sorted keyspace they cover |
Created a PR for this CIP, updated the spec and made this into the discussion thread. See #74 |
Discussion for CIP-69
See PR for latest spec: #74
The text was updated successfully, but these errors were encountered: