Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NEP-481: Synchronous wasm submodules #481

Closed
wants to merge 5 commits into from

Conversation

mooori
Copy link

@mooori mooori commented May 12, 2023

This NEP proposes to enable the synchronous execution of wasm submodules in the NEAR runtime.

Context

Acknowledgements

This is a team effort of Aurora and we thank Pagoda and community members for their input in previous conversations.

NEP Status (Updated by NEP moderators)

SME reviews:

Protocol WG voting indications:

Voting hasn't started yet.

@mooori mooori changed the title NEP-0000: Synchronous wasm submodules NEP-481: Synchronous wasm submodules May 12, 2023
@mooori mooori marked this pull request as ready for review May 12, 2023 11:27
@mooori mooori requested a review from a team as a code owner May 12, 2023 11:27
@ori-near ori-near added WG-protocol Protocol Standards Work Group should be accountable A-NEP A NEAR Enhancement Proposal (NEP). S-review/needs-wg-to-assign-sme A NEP that needs working group to assign two SMEs. labels May 15, 2023
@ori-near
Copy link
Contributor

Thank you @mooori for submitting this NEP. As a moderator, I reviewed this NEP and it meets the proposed template guidelines. I am moving this NEP to the REVIEW stage and would like to ask the @near/wg-protocol members to assign 2 Technical Reviewers to complete a technical review (see expectations below). Just for clarity, Technical Reviewers play a crucial role in scaling NEAR ecosystem as they provide their in-depth expertise in the niche topic while work group members can stay on guard of the NEAR ecosystem. The discussions may get too deep and it would be inefficient for each WG member to dive into every single comment, so NEAR Developer Governance designed this process that includes subject matter experts helping us to scale by writing a summary with the raised concerns and how they were addressed.

Technical Review Guidelines
  • First, review the proposal within one week. If you have any suggestions that could be fixed, leave them as comments to the author. It may take a couple of iterations to resolve any open comments.
  • Second, once all the suggestions are addressed, produce a Technical Summary, which helps the working group members make a weighted decision faster. Without the summary, the working group will have to read the whole discussion and potentially miss some details.
    Technical Summary guidelines:
    • A recommendation for the working group if the NEP is ready for voting (it could be approving or rejecting recommendation). Please note that this is the reviewer's personal recommendation.
    • A summary of benefits that surfaced in previous discussions. This should include a concise list of all the benefits that others raised, not just the ones that the reviewer personally agrees with.
    • A summary of concerns or blockers, along with their current status and resolution. Again, this should reflect the collective view of all commenters, not just the reviewer's perspective.
      Here is a nice example and a template for your convenience:
### Recommendation
Add recommendation
### Benefits
* Benefit
* Benefit
### Concerns
| # | Concern | Resolution | Status |
| - | - | - | - |
| 1 | Concern | Resolution | Status |
| 2 | Concern | Resolution | Status |

Please tag the @near/nep-moderators once you are done, so we can move this NEP to the voting stage. Thanks again.

@bowenwang1996
Copy link
Collaborator

As a working group member, I nominate @akhi3030 and @nagisa as SMEs to review this NEP.

@bowenwang1996
Copy link
Collaborator

@mooori thanks for the submission! Do you mind explaining the permission model in more details? For example, how would a contract decide who can deploy submodules to the contract?

@mooori
Copy link
Author

mooori commented May 19, 2023

Do you mind explaining the permission model in more details? For example, how would a contract decide who can deploy submodules to the contract?

By default, there’s only the DeploySubmodule action which requires the same permissions as DeployContract. To allow actors without full access key to deploy submodules, a contract could expose a public method which checks permissions. Something like:

impl Contract {
    pub fn deploy_submodule(&mut self, key: Vec<u8>, wasm: Vec<u8>) {
        // Check permissions and if they are satisfied, then trigger the
        // `DeploySubmodule` action to store submodule `wasm` under `key`.
    }
}

According to this approach, each contract can/must implement custom logic to check permissions.

If a contract operates under the assumption that submodules are untrusted code, it might even allow anyone to deploy submodules. This should be possible without introducing vulnerabilities as the set of host functions available to submodules is very limited (ref. section Trustless).

For contracts that want to restrict who is permitted to deploy submodules, the AccessControllable contract plugin might be a helpful tool. It facilitates restricting public methods to be invoked successfully only by accounts that have been granted user defined roles.

@bowenwang1996
Copy link
Collaborator

@mooori another question: does Aurora plan to only store wasm submodules compiled from EVM bytecode onchain or does it plan to store EVM bytecodes in the engine as well?

@birchmd
Copy link
Contributor

birchmd commented May 19, 2023

@mooori another question: does Aurora plan to only store wasm submodules compiled from EVM bytecode onchain or does it plan to store EVM bytecodes in the engine as well?

@bowenwang1996 this question is a little tangential to the NEP itself since Aurora's usage only motivated the proposal, but the proposal is more general than only our usage. That said, our idea is to have a sort of "upgrade" process where initially new EVM contracts are interpreted via SputnikVM (EVM bytecode on chain as it is today), but if the EVM contract gets a lot of usage then we will compile it to a submodule. This will make the Aurora Engine more efficient on Near overall because compiled EVM contracts will use less Near gas when they execute.

@ori-near ori-near added S-review/needs-sme-review A NEP in the REVIEW stage is waiting for Subject Matter Expert review. and removed S-review/needs-wg-to-assign-sme A NEP that needs working group to assign two SMEs. labels May 19, 2023
@akhi3030
Copy link
Contributor

@mooori, @encody: would it be possible for one or both of you to write down a few sentences comparing this NEP and https://github.com/near/NEPs/pull/480/files please? There seems to be quite a bit of overlap.

@mooori
Copy link
Author

mooori commented May 22, 2023

From my perspective of being more familiar with NEP-481, some of the differences are:

Visibility and privacy

A submodule can be executed only if the parent contract has a function which triggers the execution of the submodule. A function of wasm deployed to a namespace can be invoked directly by specifying the namespace in the FunctionCall action.

State

Submodules do not have their own state and cannot access the state of the parent contract. However, a contract may implement a custom protocol of providing submodules access to state on top of the data that can be exchanged between parent and submodule. Each namespace has its own state which is isolated from the state of other namespaces on the same account.

Host functions

The set of host functions available to submodules is limited to allow yielding back to the parent and exchanging data with the parent. Reading through NEP-480 I would assume that a contract deployed to a namespace has access to the same set of host functions as a regular contract (with state being separate as mentioned above).

Synchronous execution

Submodules are executed synchronously, which is a key feature of this proposal and supported in the PoC implementation. Also namespaced contracts should be executable synchronously, though I am not sure if it is a top priority there as well. For instance, maybe a first implementation of account namespaces enables only asynchronous execution and synchronous execution would be added later on.


@encody In case I misunderstood any details of the account namespaces proposal, please let me know.

Copy link
Contributor

@nagisa nagisa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I’m missing in this NEP is a limit on submodule call depth. It seems like the limit is implicitly 2 because the submodules are specified to have a limited access to host functions, but then there's a fair bit of text about integrating with account extensions and/or expanding number of host function calls available to call.

I wonder if it would be better to make the presence, and configuration, of such limit, explicit somehow. It could be configurable by the contract that's using submodules, or in the runtime, or both.

neps/nep-0481.md Outdated

### Submodule host functions

The submodule can import host functions to read data received from the parent, yield back to the parent and set a return value on termination. A `gas` host function must be available for gas accounting and to meet the [requirements](https://github.com/near/nearcore/blob/83b1f80313ec982a6e16a1941d07701e46b7fc35/runtime/near-vm-runner/src/instrument/gas/mod.rs#L396-L402) of nearcore wasm instrumentation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of gas function is entirely internal to the runtime and its exposure to contracts is accidental. If we're setting up distinct lists of exported host functions for parent contract modules and submodules as proposed here, there doesn't seem to be much value in exposing that function either (unless there's value in allowing these submodules to waste some gas ^^)

(The way code is structured today may require gas still, but I think the preferable option would still be to refactor so that it isn't, rather than entrenching this mistake any further)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, currently the PoC implementation requires a gas host function. Without it, preprocessing of the submodule’s wasm fails with an error of:

Link(Import("env", "gas", UnknownImport(Function(FunctionType { params: [I32], results: [] }))))'

Given that this is accidental, the refactoring of nearcore such that the gas function is not required anymore would be a change separate to the implementation of this NEP? Then we could leave a note in the NEP to remove gas once that will be possible. For this PoC implementation it should be a rather small change, presumably it only requires to revert these changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the only want from my end is to not accidentally expose this to the contracts again for no good reason. And yeah, it would mean modifying the way the contract runtime works in this respect if the implementation of this NEP happens to precede landing of finite-wasm work.

neps/nep-0481.md Outdated

The proposed wasm submodules can be interpreted as account extensions with tight restrictions. A wasm submodule is an account extension that can be executed only by the parent contract itself. It can access only submodule host functions which allow it to yield back to the parent and exchange data with it. Host functions available to regular contracts cannot be invoked by the submodule. A submodule cannot read or write the parent’s storage and has no storage associated with itself.

Due to its complexity, account extension functionality might be implemented in multiple stages. Synchronous wasm submodules could be an initial stage, which progresses towards account extensions as the restrictions mentioned above are lifted.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather look at these two different proposals not as two features that might potentially compose some day, but rather as proposals that must compose. I think keeping this in mind we will end up with something that's more generally useful and might even find new uses in future improvements to the protocol.

In fact, I believe there is very little that needs to be done to make sure this holds true. Today the two sets of proposals seem quite analogous to me in their attempt to introduce two distinct (OOP) objects with their own slightly differing semantics. If we instead separate data and methods we might quite easily end up in a place where we don't even need to draw any comparisons between the NEPs, because each one is useful on their own. In particular one way forward I could see is focusing entirely on introducing a mechanism to make a synchronous function call to an isolated wasm-core-1 module with a communication channel of some sort. Even without changes to the data model this feature can be used in some way by the contract making self-calls (and for Aurora specifically, they could temporarily bake contracts into their contract while the data model changes are underway; this would likely make this proposal much more straightforward to think 'bout & implement too)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we instead separate data and methods we might quite easily end up in a place where we don’t even need to draw any comparisons between the NEPs, because each one is useful on their own.

Intuitively I’m thinking changes to the data model such as seperate state are useful only once there exist some kind of separate entities (like submodules or contracts deployed to a namespace) which can be called. Which makes me wonder how changes to the data model along the lines of NEP-480 could be useful on their own? Probably I’m missing or misunderstanding something here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I was saying "data model changes" I meant specifically extending the protocol with a concept of an account extension or a submodule as described here. But I also think you misunderstood the order of steps I was suggesting to be taken. My thinking was that the data model changes would be the very last step once the actions and APIs necessary for both this and account extension are in place. But this is just my brainstorming. Ultimately the substance of the earlier message is:

I would rather look at these two different proposals not as two features that might potentially compose some day, but as proposals that must compose. I think keeping this in mind we will end up with something that's more generally useful and might even find new uses in future improvements to the protocol.

I don't have a particularly strong opinion on how this is achieved, but I think it is important that we don't end up with both submodules and account extensions in the protocol, because unifying them is likely to be infeasible.

neps/nep-0481.md Outdated

Limiting the interface between a parent and its submodules to passing bytes introduces overhead in cases that require interaction. For instance, instead of directly persisting data in storage a submodule has to serialize data and send it back to the parent. The parent then needs to deserialize the data, write it to storage and resume the submodule. Besides requiring extra logic this pattern also increases the number of host function calls.

That overhead could be reduced by extending the interface, for instance by making more host functions available to submodules. The trade-off with giving submodules direct access to more host functions is complexity in permissions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like something that could be implemented by introducing a mechanism for the "start submodule" to specify which host functions are available to the callee.

@mooori
Copy link
Author

mooori commented May 24, 2023

The scope of the implementation of this NEP would increase significantly by:

  • Making more or all host functions available to the new kind of module/contract.
  • Adding permissions to enable restricting host function access.
  • Allowing the new modules/contracts to call each other and introducing a configurable limit on call depth.

How would this affect the requirements for the reference implementation respectively PoC? Landing all these features in nearcore will probably be a multi-step process. Would a reference implementation for something that could be landed in a first step suffice or is the reference implementation required to support all features?

@nagisa
Copy link
Contributor

nagisa commented May 26, 2023

Would a reference implementation for something that could be landed in a first step suffice or is the reference implementation required to support all features?

The only concern to be aware of is that any contract-facing breaking changes are infeasible in NEAR, unfortunately.

With that in mind it is important that the end result that's striven for is clear, so that a plan to implement the feature the right way can be laid out. Outside of that, the introduction and implementation of the feature may take as many steps as necessary or convenient.

@bowenwang1996
Copy link
Collaborator

bowenwang1996 commented May 29, 2023

@mooori I agree with what Simonas said. If we can agree on the end goal (on the spec level) and there is a poc to demonstrate general feasibility, the implementation can take many steps. I think it is important, as Simonas pointed out, to align on how we want to reconcile this NEP with the account extension NEP. cc @encody @firatNEAR

@encody
Copy link

encody commented May 30, 2023

As far as I'm aware, there is some discussion that still needs to be had to hash this out between these two NEPs. I will be sure to report back with updates.

@encody
Copy link

encody commented Jun 2, 2023

@birchmd and I had a fruitful discussion about combining NEPs #480 and #481. Here are my notes from that meeting.


Namespaces/Submodules Priorities:

  • Synchronous execution.
  • Untrusted modules.
    • Owner doesn’t fully trust the module.
    • e.g. DAO proposals w/ attached submodule?
    • Must sandboxed interactions be sync-only?
  • Host functions.
    • Message passing / yield / callback.
    • Limiting host functions vs. permissions?

NEP layers:

  1. Multiple WASM blobs composing the logic that controls one account. (Currently, this is what NEP-480 does. However, NEP-480 currently allows submodules/namespaced contracts to have their own storage.)
  2. Modules may be public or private.
    1. Both public and private modules have names (no overlap). Hence "namespaces."
    2. Entire module is public and async, XOR private and sync. This eliminates the in-between access case otherwise required: "All functions exported from WASM can be FUNCTION_CALLed. Synchronous functions can never be FUNCTION_CALLed. However, the WASM VM cannot invoke unexported functions."
    3. The default namespace can never be sync (=never be private).
    4. All public namespaces are directly invokable via a FUNCTION_CALL.
  3. Sync-only functions allow message passing (different set of host functions). Sync implies private and vice-versa.
  4. Permissions. What are things a submodule can do?
    1. I/O to storage (costs staking). Private modules never have state.
    2. Send async interactions (regular promises).
    3. Do a lot of compute (gas limits).

Edge cases:

  • Code hash from RPC. (adding module->code_hash map)
  • Cannot delete state (at the moment, private modules have no state, so it is not a huge issue).
  • Keep all modifications constrained to DEPLOY_CONTRACT (no new actions). This maintains the invariant: “The only time the logic controlling an account can change is when the account receives a DEPLOY_CONTRACT action.”

@birchmd
Copy link
Contributor

birchmd commented Jun 2, 2023

Thanks for the comments @nagisa and @bowenwang1996

I agree that it is important to have a clear vision on how this NEP and #480 align with each other. I just had a call with @encody where we discussed this. He has posted more detailed notes above, but here is my high level summary.

We will consider Account Namespaces to be a logical prerequisite for Synchronous execution (I say "logical" because we could work on getting the implementations done in parallel). Therefore this NEP will be an extension of the previous one. The additional functionality this NEP adds is to allow specifying another field in the DeployContract action which marks the namespace being deployed as being synchronous-only (by default all namespaces will be asynchronous). Synchronous-only namespaces differ from asynchronous ones in the ways that the "submodules" of the currently proposal differ from normal contracts (cannot be invoked by FunctionCall actions, have access to a limited set of host functions, etc.).

There are still some details to work out, but we'll do that next week. I'll make another comment once we have finished updating this NEP to be an extension on top of #480.

@nagisa
Copy link
Contributor

nagisa commented Jun 5, 2023

I’m glad to hear positive news here. I do have a question though: what is the reason to add a flag for whether the contract is synchronous-vs-asynchronous to DeployContract? As far as I can tell this distinction would already be present implicitly: contracts meant to be called in a synchronous manner cannot import (pretty much) any host functions and are useless as asynchronously called contracts; and whenever synchronous contract gets called asynchronously they won't have anybody to communicate messages to (or yield) which I imagine would make them at least panic (but likely fail to link too…)

I guess, personally, considering the developer UX, I would prefer synchronicity of the contract to be a property of the contract code, rather than the deploy action.

@birchmd
Copy link
Contributor

birchmd commented Jun 6, 2023

Thanks for the comment @nagisa ! I'll keep this in mind as I am revising the NEP. Part of the purpose of the deploy-time flag was also to mark external visibility (is this namespace callable from other accounts using the FunctionCall action) because it might be useful to have "private namespaces". This is separate from the question of whether it is synchronous or not because a namespace which includes no calls to yield might still need to be private for some reason. As you said, all methods which use yield would already be private in the sense they would panic if you tried to call them asynchronously. So maybe this "private" namespaces idea is not needed in this NEP and could be introduced as a separate concept in the future instead.

@birchmd
Copy link
Contributor

birchmd commented Jun 9, 2023

@nagisa @encody @bowenwang1996 I've revised the NEP to be based on the changes proposed in NEP-480. Please take another look and let me know what you think.

@DavidM-D
Copy link
Contributor

DavidM-D commented Jun 14, 2023

Do we know how much this would actually save Aurora in gas per month?

I can see one benchmark which is numeric + writing to memory that has been hand written in rust rather than cross compiled from EVM code. This seems like the most favorable compiled/interpreted benchmark possible and is likely not representative of the speedup they're actually going to see.

How does this compare to WebAssembly Components? Since we're going to have to implement them eventually anyway, is it worth us holding off until then? We'd then get education, tooling and runtime support for free.

@birchmd
Copy link
Contributor

birchmd commented Jun 14, 2023

Thanks for getting involved in the discussion @DavidM-D

Do we know how much this would actually save Aurora in gas per month?

You are right that benchmark is not typical, but gives us a ceiling for this approach. It turns out that ceiling is quite high as that example had a 1500x speed-up. We have another example which sort of gives us the floor. In that example I manually translate the ERC-20 contract into Rust (keeping the Solidity ABI) and compile it to Wasm. Since ERC-20 is almost entirely IO (which will be identical in both the EVM interpreter and Wasm cases because IO is done via Near host functions), we expect there to be very little improvement from this approach. But still in that case we saw a 15% improvement, which is a pretty good floor all things considered. Without doing a more comprehensive study (which is something we hope to do eventually) we can't say for sure how much gas Aurora will save. But with a floor of 15% and ceiling of 99.9% savings, we could reasonably expect 40% or 50%.

But it's also worth mentioning that it's not just about current savings, but also about creating new possibilities. There are some use cases that are impossible on Aurora today because of gas limitations. Near's 300 Tgas limit translates into around a 3 million EVM gas limit for transactions on Aurora, which is lower than the block gas limit on Ethereum. There are real use cases that require more than 3 million EVM gas; for example flashloans can use a lot of gas as they interact with multiple defi components. The only way for Aurora to enable these use cases is to make our contract more efficient at executing EVM contracts and this synchronous Wasm approach is by far the most promising direction we have.

How does this compare to WebAssembly Components

This proposal intentionally does not interact with Wasm Components. Our proposal provides a way for the Near runtime to compose multiple independent Wasm VMs, as opposed to some kind of dynamic loading within a single VM. The reason we are avoiding Components is because it is unclear what the timeline would be for the standard to be finalized and then included into the production VM Near uses. A 2 year timeline for example is not acceptable given that Aurora wants to make use of this feature sooner that this.

Of course when the Components feature is eventually released it should be possible to implement the host functions described in the proposal using that feature instead of independent VMs. This will be a welcome performance improvement which should lower the gas overhead of making a synchronous call on Near.

@akhi3030
Copy link
Contributor

@birchmd: some clarifying questions please.

So if I understand it properly, aurora would like to have this feature because currently they are executing EVM bytecode in an interpreter which is quite slow compared to compiling it down to wasm. Is this correct? If so, just a bunch of questions on the flow if you do not mind please.

  • What is the source of the EVM bytecode? Are there a lot of different EVM contracts that are being uploaded to aurora on a regular basis? Does the aurora contract itself need to be recompiled when new contracts are presented? Where do you envision the compilation from EVM to wasm to take place? How expensive is this compilation?

I suppose, in general, I am trying to understand what is the aurora flow today and how will it change after this NEP is accepted.

@birchmd
Copy link
Contributor

birchmd commented Jun 14, 2023

@DavidM-D

How are you planning to translate these contracts to WASM ... do you have any benchmarks on code generated from this compiler?

We do have a compiler in progress and the plan would be to use this compiler to translate EVM contracts to Wasm. Using the same benchmark as where we saw the 1500x speedup using hand-written Rust, the compiler sees only a 15x speedup. But still getting an order of magnitude from the EVM bytecode directly (no manual intervention required) is promising. We currently have two engineers on staff at Aurora actively working on the compiler.

What prevents you from ... statically linking it to the Aurora contract then redeploying?

This is an approach we have considered before. It is technically feasible, but certainly not ideal. The main reasons are (1) deploying an update to Aurora Engine requires approval of the DAO, creating additional administrative overhead; (2) the Aurora Engine is a large contract (almost 1 MB), so it would be pretty inefficient to deploy the whole thing frequently.

Does an additional submodule not necessitate changes to the original module?

No. The idea is similar to how you do not need to re-install VS Code when you add a new extension (or think of any other example where plug-ins are used on the regular). The core app is designed to know how to communicate with any extension that follows the right interface. We would define the interface between the Aurora Engine and any module being used as an EVM contract, then to add a new module we would only need to call an admin-only function to dynamically redirect the Engine's control flow for the address we are adding the Wasm module for. This control flow redirect mechanism already exists in the Engine since it is how custom precompiles (like those that are involved in bridging) are implemented.

would the issues you're dealing with be solved if your gas limit was increased?

The issues would be mitigated by a gas limit increase. I have suggested this before and increasing the limit is also not a trivial protocol change. I say "mitigated" not "solved" because we want at least a factor of 10 increase in the amount of EVM gas we can process in a single Near transaction, but there is no way we can increase Near's transaction gas limit by a factor of 10 (the computation needed would exceed the block time).

@akhi3030

What is the source of the EVM bytecode?

Users deploying EVM contracts using EVM transactions sent to the Aurora network. Essentially it is the equivalent of the source of bytecode on Ethereum mainnet or even Near itself (of course Near contracts are Wasm bytecode not EVM bytecode).

Are there a lot of different EVM contracts that are being uploaded to aurora on a regular basis?

Yes, there are. Probably a few hundred per week. But we would not compile all the EVM contracts to Wasm. We would only compile the ones that are used frequently because those are the ones that matter for gas usage. Another case would be a specific partner who knows in advance the gas limit will be a problem, then we may compile their EVM contract to Wasm from day 1.

Does the aurora contract itself need to be recompiled when new contracts are presented?

No. As of today EVM bytecode is persisted in the Aurora Engine contract storage. In a world with the proposed synchronous Wasm feature new contracts would be deployed under new namespaces so the core Engine contract does not need to change.

Where do you envision the compilation from EVM to wasm to take place?

We will do the compilation off-chain and deploy the result. Obviously there are security considerations here, but we would make the compilation process deterministic so that users could verify the Wasm module matches its source EVM bytecode. The correctness of the compiler is also important of course, but we will have extensive testing (possibly including fuzzing or formal methods) to ensure this.

How expensive is this compilation?

The compiler today is quite slow (can take multiple minutes for a single contract), but as an off-chain process that is not a problem. Getting the compiler to the point where it could happen on-chain is not feasible in my opinion.

@akhi3030
Copy link
Contributor

@birchmd: thanks for the detailed response. Just to make sure I understand, the flow is the following:

  • aurora stores various user projects EVM contracts in its storage.
  • normally it is using the interpreter to run these contracts.
  • from time to time, you may compile some of the contracts offline to wasm and store the result also in the aurora's storage.
  • then you plan on using this feature to run the wasm contracts.

Is the above correct?

Another naïve question please. Will it not be possible to deploy the compiled wasm contracts into a separate account and call out to these accounts to execute them? I suppose implementation wise it should not be so complicated but the problem will be latency there of cross contract calls?

@birchmd
Copy link
Contributor

birchmd commented Jun 15, 2023

and store the result also in the aurora's storage.

This isn't quite right. We will not store the Wasm modules in the Aurora Engine contract storage because there is no (proposed) way to use it from there. The modules will need to be deployed to namespaces (following the specification of NEP-480) of the Aurora Engine account.

Will it not be possible to deploy the compiled wasm contracts into a separate account

No, this does not work with synchronous execution because separate accounts may be on separate shards. Aurora's use case requires synchronous execution because the EVM is synchronous and therefore we need synchronous calls for these Wasm modules to be used as part of an EVM execution.

I also maintain that synchronous calls are important for other use cases as well. The main thing they provide is atomicity (all actions are committed or none are). Composing programs into an atomic result is easier (safer) than trying to compose programs in a non-atomic way because you do not need to worry if all intermediate states maintain whatever invariants are important to the security of the overall interaction.

@akhi3030
Copy link
Contributor

@birchmd: thanks for the response. It seems to me that a more general problem to solve here might be how to do atomic cross contract calls between contracts on a single shard. It seems like namespaces are being used to ensure that the contracts are in the same shard. So I would imagine that if we had a mechanism of doing atomic cross contract transactions on a single shard, then that might cover aurora's usecase.

Having said that, I do believe that implementing that would be quite a big task and also require quite a significant change to the user experience i.e. exposing the concept of shards.

WDYT?

@birchmd
Copy link
Contributor

birchmd commented Jun 16, 2023

Exactly, allowing multiple Wasm modules on a single account is specifically to ensure all data is available in the same shard. So, yes, having a more general mechanism where cross-contract calls between any two accounts in the same shard would work for Aurora (and any other use case I can think of since we would essentially replace the concept of namespaces with the concept of sub-accounts, which already exists).

However, I also agree that this is a departure from Near's original design philosophy and complicates the user experience. Shards have always been invisible to users both because it simplifies their experience and because it means that there are no restrictions on resharding (the shard boundaries can theoretically be moved at any time to optimize execution). If we expose shards on the user level we lose both of these benefits.

Allowing multiple Wasm modules on a single account also introduces complexity for users, so whether we expose shards or introduce namespaces is maybe equivalent from a UX complexity standpoint. But resharding is an important point to consider. Is it worth giving up potential future optimizations in exchange for having synchronous execution between some accounts? I am leaning towards 'no', but if resharding is not anywhere on the horizon then maybe I could be convinced otherwise.

@akhi3030
Copy link
Contributor

Based on experience, I have my reservations against abstracting sharding. Generally, power users can use a network / system more efficiently when it is possible to bypass such restrictions. Still I think we are going off tangent. I think we should identify exposing sharding as an alternate solution to this problem and let the WG decide on the best path forward.

In that spirit, I would like to see a brief discussion under https://github.com/near/NEPs/pull/481/files#diff-c42ac558a9ca73f718e12332e4d86a94478776ee3fa8780dfef53bf8c13a4268R142 on the exposing sharding approach.

@akhi3030
Copy link
Contributor

@birchmd, @mooori: thanks for adding the suggestion. I left a minor comment on it but otherwise looks good.

@bowenwang1996
Copy link
Collaborator

@akhi3030 the protocol working group discussed the topic of whether we should allow synchronous execution within the same shard and the answer is no, mostly because it deviates from homogeneous sharding, which is a fundamental design philosophy of NEAR. While the opinion of the working group may change in the future, this is considered the final decision for now. I also think that if we allow synchronous execution within the same shard, then naturally we will shift towards a model where almost all the calls are synchronous because of the desire to take advantage of it. Then we essentially move towards an appchain-centric model like many other blockchains.

@akhi3030
Copy link
Contributor

@akhi3030 the protocol working group discussed the topic of whether we should allow synchronous execution within the same shard and the answer is no, mostly because it deviates from homogeneous sharding, which is a fundamental design philosophy of NEAR. While the opinion of the working group may change in the future, this is considered the final decision for now. I also think that if we allow synchronous execution within the same shard, then naturally we will shift towards a model where almost all the calls are synchronous because of the desire to take advantage of it. Then we essentially move towards an appchain-centric model like many other blockchains.

Thank you for the clarifications @bowenwang1996. This helps me as an SME in figuring out how to make progress on this proposal.

@jakmeier
Copy link
Contributor

Have you also considered to include a WASM interpreter inside the smart contract?
Maybe I'm missing something but it seems equally or more powerful than the proposed sync_function_call and friends.
And it could be a nice first iteration of sync WASM execution that doesn't require protocol changes but it could still be swapped easily if the protocol supports native sync WASM execution later in the future.

The main question that I see is performance. My first guess would be to try and benchmark WAMR in the "fast interpreter" setting.

Fast Interpreter (FI): Precompile the Wasm opcode to internal opcode and runs ~2X faster than the classic interpreter, but it consumes a bit more memory than CI.

source: https://bytecodealliance.github.io/wamr.dev/blog/introduction-to-wamr-running-modes/

(How they achieved the fast interpretation is an interesting read, too: https://www.intel.com/content/www/us/en/developer/articles/technical/webassembly-interpreter-design-wasm-micro-runtime.html)

@mooori
Copy link
Author

mooori commented Jun 30, 2023

@jakmeier thank you for the suggestion and references. We are looking into it and will try to come up with some benchmarks.

@mooori
Copy link
Author

mooori commented Jul 28, 2023

Interpreting wasm inside wasm

wasmi interpreter

In a first step we used the wasmi interpreter. It fits this use cases nicely as it is a Rust library which can be compiled to wasm and interpreting wasm inside wasm is actually one of its intended use cases. This makes embedding in a Near smart contract relatively simple. Moreover, it is supposed to be “lightweight” and have a “low-overhead”.

Here is a contract which uses wasmi to interpret the wasm bytecode corresponding to another contract. It is exciting to see that this works and it is quite straightforward after understanding some quirks of interpreting one contract inside another.

However, interpretation with wasmi is so expensive in terms of gas that it most likely cannot be helpful in Aurora’s use case. This repo contains code to compare the gas usage of executing a contract directly on Near and interpreting it inside another contract. Results are available here and the main takeaways are:

  • Setup of wasmi is expensive (very roughly 70 TGas)
  • Increasing loop-limit (the number of times calculations are repeated) leads to only a slight increase in gas costs when the contract is executed directly. In contrast, when interpreting the same contract the 300 TGas limit is approached very quickly.

wamr interpreter

Embedding wamr in a Near contract written in Rust is more tricky, since wamr is written in C and wasm it not among its supported architectures and platforms. Rust bindings exist, however they haven’t been updated in more than 2.5 years. Using these bindings, trying to do the same thing as above with wasmi fails: out of the box the contract cannot be compiled to —target wasm32-unknown-unknown. Even though on the same machine I can build wamr itself.

Since the gas usage of interpreting wasm with wasmi doesn’t look promising, it might not be worth trying to make the same thing work with wamr. Still, to assess the performance of wamr’s fast interpretr in comparison to wasmi I’ll try to:

  • Measure the performance of the fast interpreter using iwasm (the wamr executable).
  • Compare it to the performance of wasmi’s CLI version.

cc @jakmeier

@mooori
Copy link
Author

mooori commented Aug 3, 2023

Assessing wamr fast interpreter performance

Executing a function off-chain, wamr in fast interpreter mode provides a speedup of factor ~2 compared to wasmi (for long running functions). The approach of comparing performance and more detailed results can be found here. There is a Makefile listing the commands executed to obtain the results, which should allow to reproduce relative differences in performance.

Assuming when embedded in a Near contract the speedup of wamr over wasmi roughly remains at factor 2, “wasm in wasm” would likely not be a viable solution for Aurora’s use case. The reason are high gas costs of interpretor setup and of the interpretation itself.

@jakmeier
Copy link
Contributor

jakmeier commented Aug 3, 2023

Thanks @mooori for the detailed analysis! I'm glad you checked if this is an option but I can see this is not really viable. At least for Aurora. Other use cases that want sync execution might still be able to use it, I think.

offtopic/FYI: Your research regarding performance of different interpreters is also useful for considerations if we should move to a interpreter for validators, which is listed as an option in near/nearcore#9379

@walnut-the-cat
Copy link
Contributor

Hello, NEP moderator here.

@mooori , is this still ongoing project?

@akhi3030 and @nagisa , could you share your conclusion after reviewing this NEP as SMEs?

@birchmd
Copy link
Contributor

birchmd commented Nov 20, 2023

This work has been put on hold for now. Closing this NEP and we can open a new one if the work resumes.

@birchmd birchmd closed this Nov 20, 2023
@walnut-the-cat walnut-the-cat added S-retracted A NEP that was retracted by the author or had no activity for over two months. and removed S-review/needs-sme-review A NEP in the REVIEW stage is waiting for Subject Matter Expert review. labels Nov 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-NEP A NEAR Enhancement Proposal (NEP). S-retracted A NEP that was retracted by the author or had no activity for over two months. WG-protocol Protocol Standards Work Group should be accountable
Projects
Status: RETRACTED
Development

Successfully merging this pull request may close these issues.

10 participants