Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merging two MAST forests together #1529

Closed
Dominik1999 opened this issue Oct 7, 2024 · 5 comments · Fixed by #1534
Closed

Merging two MAST forests together #1529

Dominik1999 opened this issue Oct 7, 2024 · 5 comments · Fixed by #1534
Assignees
Milestone

Comments

@Dominik1999
Copy link
Contributor

@PhilippGackstatter to create the issue. @bobbinth can help to define the issue

@bobbinth
Copy link
Contributor

bobbinth commented Oct 8, 2024

Here is what I think we want to do here:

Given two MastForest structs, we should merge them together (we can also generalize this beyond just 2 structs). The resulting MastForest should contain all the ndoes/roots/decorators from the merged forests, with duplicates removed.

I suspect that handling of duplicates will probably be one of the more complicated parts (because we need to compare based not just on MAST root but also on "eq hash"). We could try to use a similar strategy as what we use in MastForestBulder - but maybe there is a simpler way as well.

@bobbinth bobbinth added this to the v0.11.0 milestone Oct 8, 2024
@PhilippGackstatter
Copy link

PhilippGackstatter commented Oct 8, 2024

Thanks! I wrote down my understanding and approach and some questions.

To merge two MastForests A and B we need to combine the nodes and roots from both.

Nodes

To combine the nodes we can keep the nodes from A as they are. Then:

  • we first check for duplicates by comparing the EqHashes of nodes in B and see if they exist in A. To do this efficiently I think we need to create an index of A's node EqHashes first (similar to how MastForestBuilder keeps such an index). If a node in B is already in A, we can add that node's original index to a map. Later, when we add all nodes from B to A and we encounter that index, we do not add it and make all nodes that pointed to that original index point to the existing index instead.
  • Node Ids that do not need to be remapped need to be offset by A.nodes.len(), so that they still point to the correct nodes in the new merged forest.

To give an example for both points: Suppose we have some forest A with these nodes:

[Block(foo), Block(bar)]

and merge some forest B into it:

[Block(other), Block(bar), Join(0, 1)]

then we should get:

[Block(foo), Block(bar), Block(other), Join(2, 1)]
  • The Join node was remapped to point to the existing bar block and the duplicate bar procedure was not added again.
  • The Join node's index that points to other was offset by the length of A.

Roots

  • The roots can be merged similarly to nodes: We can reuse the mapping we created during duplicate checking to ensure we do not add the same root twice.

Implementation

Using the MastForestBuilder for merging seems like a good idea in principle, but I think it requires some information that is no longer available in a MastForest itself, like the GlobalProcedureIndex when adding procedures. So it probably means we cannot reuse it in the implementation completely. Parts of it like the eq_hash_for_node can be reused.

Open Questions

  • It's not clear to me whether we have to deal with ExternalNodes in a special way. What if A contains an ExternalNode(foo) and B contains the root for foo? This would happen, afaict, if we were to merge the Mast forest of a program which uses the standard library together with the Mast forest of the standard library itself. Do we have to a) do nothing, b) return an error or c) rewrite the external node to a "call" (exec, call, ...) to the node in B?
  • A general question that came up, which is not specific to merging, was if we use EqHash which takes the decorators into account, then we can end up with nodes in the forest that have the same MAST root, right? Or is there some invariant somewhere that prevents this still? If not, when we call a node by hash (through dyncall for example), then couldn't we end up calling the "wrong" node with/without decorators?

@bobbinth
Copy link
Contributor

bobbinth commented Oct 9, 2024

Yes, this pretty much exactly correct. A few comments:

To merge two MastForests A and B we need to combine the nodes and roots from both.

Yes, but I think we also need to merge the decorators.

Using the MastForestBuilder for merging seems like a good idea in principle, but I think it requires some information that is no longer available in a MastForest itself, like the GlobalProcedureIndex when adding procedures. So it probably means we cannot reuse it in the implementation completely. Parts of it like the eq_hash_for_node can be reused.

Agreed! I don't think we can use MastForestBuilder here - but we can re-use some of the code/concepts. Ideally, we'd also re-use a big part of the code (w/e can be moved to miden-core) but not sure how much we can do here.

It's not clear to me whether we have to deal with ExternalNodes in a special way. What if A contains an ExternalNode(foo) and B contains the root for foo? This would happen, afaict, if we were to merge the Mast forest of a program which uses the standard library together with the Mast forest of the standard library itself. Do we have to a) do nothing, b) return an error or c) rewrite the external node to a "call" (exec, call, ...) to the node in B?

If A has external node with MAST root x and B has a non-external node with MAST root x, we should remove the external node and update any nodes that relied on it to now point to the corresponding node from B.

The main point of external nodes is to say "the definition of this MAST node is not available within this MAST forest". And so, since the union of A and B would contain such a definition, there is no longer a need to keep the external node.

A general question that came up, which is not specific to merging, was if we use EqHash which takes the decorators into account, then we can end up with nodes in the forest that have the same MAST root, right? Or is there some invariant somewhere that prevents this still? If not, when we call a node by hash (through dyncall for example), then couldn't we end up calling the "wrong" node with/without decorators?

This is one of the tricky aspect of the MAST forest and I'll tag @plafer here as well in case my explanation is missing something. But basically, yes, it is possible to have two or more nodes with the same MAST root in the MAST forest (but I don't think that's possible for roots). Semantically, such nodes should be identical - i.e., executing either of them should have exactly the same effect as decorators should not affect how a program is executed (decorators should mostly be used for debug purposes). This is not entirely true now - but should be true once #1457 is completed.

@PhilippGackstatter PhilippGackstatter linked a pull request Oct 16, 2024 that will close this issue
@plafer
Copy link
Contributor

plafer commented Oct 17, 2024

A general question that came up, which is not specific to merging, was if we use EqHash which takes the decorators into account, then we can end up with nodes in the forest that have the same MAST root, right? Or is there some invariant somewhere that prevents this still? If not, when we call a node by hash (through dyncall for example), then couldn't we end up calling the "wrong" node with/without decorators?

Indeed, that is a current quirk of the design, and has been the focus of many discussions. From program verification's standpoint, the MAST root fully identifies a program. Decorators were introduced to direct the prover that do things that are really an implementation detail from proof verification standpoint (such as outputting a debug message, or writing something to the advice stack). The problem that this introduces then, especially when using external libraries, is that programs identify procedures to execute by their MAST root, but that turns out to be insufficient information for the prover, as 2 procedures with the same MAST root can differ e.g. in the debug statements they output.

Our almost-perfect solution (with #1457) is to turn the decorators that do non-debug things to instructions (i.e. they will now modify the MAST root). Hence, after #1457, theoretically the only quirk in program execution that occurs is that the prover ends up executing the correct procedures but with the wrong debug instructions (in the case where 2 procedures are identical except for e.g. their print statements). As discussed at length, we currently evaluate the probability of this case to happen to be quite low - and we'll be forced to revisit this if it turns out to occur more than we expected.

@plafer
Copy link
Contributor

plafer commented Oct 30, 2024

Closed by #1534

@plafer plafer closed this as completed Oct 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants