Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interp doc for implementation of calls #2858

Open
wants to merge 3 commits into
base: feature/CoreclrInterpreter
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 46 additions & 0 deletions docs/design/interpreter/calls.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
### Interpreter call convention

Within the interpreter, every single method will have a structure allocated that holds information about the managed method. Let's assume the name of this structure is `InterpMethod` (which is its name within the Mono runtime). This structure will contain important information that is relevant during execution (for example pointer to the interpreter IR code, the size of the stack that it uses and much more information).

Interpreter opcodes operate on offsets from the base stack pointer of the current interpreter frame. The arguments for a method will be stored one after the other on the interpreter stack. Each argument will be stored at an offset on the stack aligned to 8 bytes (at least on an 64 bit arch). Every primitive type would fit in an 8 byte stack slot, while valuetypes could occupy a larger stack space. With every call, the interpreter stack pointer will be bumped to a new location where the arguments are already residing. This means that the responsability of a call instruction is to first resolve the actual method that needs to be called and then just initialize the state to prepare the execution of the new method (mainly initialize a new `InterpFrame`, set current stack pointer to the location of the arguments on the interp stack and the `ip` to the start of the IR opcode buffer).

If we need to call a method that has its code compiled, then we would instead call some thunk that does the call convention translation, moving arguments from the interpreter stack to the native stack/regs, and which later dispatches to the compiled code.

### Direct calls

A direct calls is a call where the method to be called is known during method compilation time, for example for static or non-virtual calls. In this situation, when the call is compiled, an `InterpMethod` is allocated for the target method that needs to be called and this pointer is embedded into the generated interpreter code. No additional work is needed at execution, the `InterpMethod` is fetched from the opcode stream, for the first call the method will have to be compiled and then call dispatch continues as described above.

In order to account for the scenario where the method to be called is aot compiled, when emitting code during compilation, we would first check if the method is present in an aot image. If that's the case, we would emit a different opcode instead and embed in the code the pointer to be called. During execution, this opcode would fetch the transition wrapper and proceed with the call.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be useful to enumerate the different types of transition wrappers that we expect to exist in the AOT image to support interaction of interpreted and AOT compiled code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description of transition wrappers is included in the compiled-code-interop.md document. There are mainly two types of wrappers, each handling a type of signature:

  • interp exit: for interp->aot or pinvoke calls from interp. Conservatively, we should generate these in the aot image for all signatures of aot compiled methods and for pinvoke signatures
  • interp entry: for aot->interp or unmanaged callers only. Conservatively, we should generate these in the aot image for all indirect call signatures that are encountered in aot-ed code and unmanaged callers only signatures. This wrapper will also need dynamically generated thunks/fat pointer mechanism for embedding additional argument.

It's plausible that we would encounter scenarios where a required wrapper is missing and I've describe alternatives, but I don't think it is something that should be prioritized in this stage. Also I imagine there could be minor differences between interoping with aot and interoping with foreign code via pinvokes. I'm not convinced whether this would actually require separate wrappers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CoreCLR has been moving towards similar kind of transition wrappers to support MethodInfo.Invoke. They are dynamically generated and JITed today. It would be nice to share the transition wrappers between reflection and interpreter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In order to account for the scenario where the method to be called is aot compiled, when emitting code during compilation, we would first check if the method is present in an aot image. If that's the case, we would emit a different opcode instead and embed in the code the pointer to be called. During execution, this opcode would fetch the transition wrapper and proceed with the call.
In order to account for the scenario where the method to be called is AOT compiled, when emitting code during compilation, we would first check if the method is present in an AOT image. If that's the case, we would emit a different opcode instead and embed in the code the pointer to be called. During execution, this opcode would fetch the transition wrapper and proceed with the call.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throughout.


### Virtual/Interface calls

When we need to do a virtual call, we would include the virtual `InterpMethod` in the call opcode but this needs to be resolved at runtime on the object that it gets called on. Calling into the runtime to resolve the target method is expected to be a slow process so an alternative is required.

The Mono approach is based on the fact that each virtual method of a class has a designated slot in the vtable, that is constant on all classes that implement it. This means that the in the object header we can have a method table with each slot containing the target `InterpMethod` that needs to be called when calling through that slot. For virtual methods that have generic arguments, we could end up having the same slot occupied by multiple generic instantiations of the same method. Instead of having a single target method in this slot, we would have a linked list of key/value pairs, mapping the virtual method that needs to be called to the target method. When a call is made through this slot, we would iterate the list and if we don't find the method to call, we would call into the runtime to resolve the virtual method and then add it to this collection. For interface calls, we have a separate method table that has a fixed size. For each interface method we can compute a slot in this table, that is fixed across all types implementing this interface. Calling an interface method means calling through this slot where we can always have collisions with methods from other interfaces. We reuse the same mechanism of calling through a slot for virtual generic methods, with each slot holding a list of interface method / target method pairs, that are used to resolve the actual method that needs to be called.

If we are to follow a similar approach, this would mean that in the `MethodTable` we would have at least an additional table where we would cache pairs of virtual method / target method. We could also have a one entry cache per call site, following on the idea of virtual stub dispatch. My current understanding is that if this call site cache fails, then falling back to calling into the runtime would be way too slow, so we would still need to have some sort of cache table in the `MethodTable` for the already resolved virtual calls on the type. If this is the case, then call site caching would be a feature that would provide limited benefit over `MethodTable` lookups, and can therefore be implemented later on if considered useful.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CoreCLR uses per-call site cache to optimize monomorphic calls and global singleton hashtable for polymorphic calls. Interpreter should use the same strategy as the rest of the CoreCLR.

There are number of different possible interface dispatch strategies with different tradeoffs. I do not think it makes sense to use different strategies in a single runtime. It would require paying twice for the supporting data structures, code and implementation complexity.

FWIW, .NET Framework 1.0 used yet another strategy: It was similar to the Mono strategy expect that it used extra indirection to avoid collisions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really dislike the idea of tweaking the MethodTable for interpreter specific stuff and I'm all for reusing the same approach for virtual dispatch that the jit has. I was just not sure about how easy it is to efficiently reuse the same machinery since I'm thinking it was designed for calls from compiled code, patching it etc.


During compilation, we have no way to tell whether a virtual method will resolve to a method that was already compiled or that needs to be interpreted. This means that once we resolve a virtual method we will have to do an additional check before actually executing the target method with the interpreter. We will have to look up for the method code in aot images (this lookup should happen only during the first invocation, a flag should be set on the `InterpMethod` once the lookup is completed). Following this check we would either continue with a normal intepreter call or dispatch via a transition wrapper. This would mean that we can have an `InterpMethod` structure allocated for an aot compiled method, but a flag would be set that this is aot-ed and we never attempt to execute it with the interpreter.

### Indirect calls

Indirect calls in the interpreter mean that we have no information about the method to be called at compile time, there is no `InterpMethod` descriptor embedded in the code. The method to be called would be loaded from the interpreter stack. If we would know for sure we are calling into the interpreter, then the solution would be trivial. Code that loads a function pointer would load an `InterpMethod` and indirect calls would just execute it in the same way as a normal call would. The problem arises from interoping with compiled code. Compiled code would rather express a function pointer as a callable native pointer whereas the interpreter would rather express it directly as an `InterpMethod` pointer.

Based on the assumption that a method can either be compiled (aka present in AOT images) or interpreted, we could impose the condition that a function pointer for a compiled method is an aligned native function pointer and that a function pointer for an interpreted method is a tagged `InterpMethod`. The `InterpMethod` descriptor will need to contain a field for the `interp_entry` thunk that can be invoked by compiled code in order to begin executing this method with interpreter. Both representations would need to be handled by JIT and interpreter compilation engines.

On the interpreter side of things, `ldftn` and `ldvirtftn` opcodes would resolve the method to be loaded and then look it up in the AOT images. If the method is found, then we would load the address of the methods code, otherwise we would load a tagged `InterpMethod` descriptor. `calli` would check if the function pointer is tagged or not. If it is tagged, it will untag it and proceed with the normal interpreter call invocation. If it is not tagged, then it will obtain the appropriate transition thunk (for on the signature embedded in the `calli` instruction) and it will call it passing the compiled code pointer together with the pointer to the arguments on the interpreter stack.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the same tagged pointer approach be used to dispatch virtual methods? ie the virtual method gets resolved to a code pointer that may be a tagged pointer, and then we treat it as an indirect call.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a good idea that could help with reusing the same virtual dispatch machinery between interp/aot.


In a similar fashion, on the JIT side of things, `ldftn` and `ldvirtftn` will either produce a callable function pointer or a tagged `InterpMethod` descriptor. `calli` would check if the function pointer is not tagged, in which case it would do a normal native call. If the function pointer is tagged, then it will instead do a normal native call through `InterpMethod->interp_entry`. This field will have to be initialized to an interp entry as part of the `ldftn`, but it could also be lazily initialized at the cost of another check.

### PInvoke calls

PInvoke calls can be made either by a normal `call` to a pinvoke method or by doing a `calli` with an unmanaged signature. In both cases the target function pointer that is available during code execution is a callable native function pointer. We must just obtain the transition wrapper and proceed to call it passing the native ftnptr.

### Calls from compiled code

Whenever compiled code needs to call a method, unless the call is direct, it will at some point query the runtime for the address of the method code. If at any point the runtime fails to resolve the method code, it should fallback to generating/obtaining a transition wrapper for the method in question. The caller will have no need to know whether it is calling compiled code or whether it is calling into the interpreter, the calling convention will be identical.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is the interpreter mode, I think I'd like to have a requirement imposed here and say the thunk must be pre compiled. I don't see the obvious benefit to generating anything on the fly. I'd like to see some of these cases narrowly defined with asserts rather than functionality that we aren't testing or using regularly.

Do we have a scenario where we need an unmanaged entry point and don't have an AOT image?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasm?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that just a current implementation detail? Is it needed for CoreCLR's interpreter? For iOS we can utilize the remapping trick so generating an actual callable native function pointer is relatively simple. For WASM, can we impose a requirement that there be some AOT requirement for reverse P/Invoke thunks or we generate a finite set of entries and when that is exhausted, failure?

It is possible I am missing something here, but I'd really like to avoid generating scenarios are complete for completeness sake. Inevitably we create solutions that bit rot. This is top of mine for me because we have some many features that aren't used and dictate how we currently design and build new features. If we need them and they have consistent testing, let's do it, otherwise we shouldn't be creating features that will inevitably atrophy because they aren't needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've touched this subject in the compiled-code-interop document. So ideally we would have precompiled wrappers for all necessary signatures. Let's say aot code does an indirect call. During compilation we can conservatively assume that this call might end up in the interpreter so we would generate an interp_entry wrapper for this signature. However, I think there can be cases where we might not know exactly all the required signatures. Let's say this call has some generic arguments that maybe aren't easy to determine during app compilation (?? I don't fully understand the generic sharing story when valuetypes are included). Or rather we just observe that generating wrappers for every signature is incredibly wasteful with app size and we try to generate less of them at the cost of sometimes not having an immediately available wrapper. I described a potential fallback in the document I mentioned, where we would emit some low level IR to be able to handle any kind of transition. I loosely referred to this approach as "generating code", but it doesn't actually mean emitting executable code at runtime since this is assumed impossible.

However, on a separate note, I strongly think it would be good to be able to compile these wrappers at run time with JIT, in case it is not too difficult. If you have a bug that you are sure is interpreter related, it might be a massive pain to have to run all sorts of build tools like assembly scanning, aot all wrappers etc. Just writting a console sample and run the assembly with corerun would greatly speed up development on the interpreter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is reasonable to require some sort of AOT image for Wasm that does imply some sort of AOT pass for both build and publish.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lewing wouldn't that mean we'd need to have the wasm toolchain available all the time?


### Delegate calls

The runtime provides special support for creating/invoking delegates. A delegate invocation can end up either in compiled code or in the interpreter. On mono, the delegate object has jit specific fields and interpreter specific fields and the initialization is quite messy. Ideally, as a starting solution at least, a delegate would contain a function pointer and its invocation would reuse the patterns for indirect calls via `calli` and would not require much special casing.

Loading