dotnet · BrzVlad · Dec 5, 2024 · Dec 13, 2024 · Dec 13, 2024 · jkotas
diff --git a/docs/design/interpreter/calls.md b/docs/design/interpreter/calls.md
@@ -0,0 +1,46 @@
+### Interpreter call convention
+
+Within the interpreter, every single method will have a structure allocated that holds information about the managed method. Let's assume the name of this structure is `InterpMethod` (which is its name within the Mono runtime). This structure will contain important information that is relevant during execution (for example pointer to the interpreter IR code, the size of the stack that it uses and much more information).
+
+Interpreter opcodes operate on offsets from the base stack pointer of the current interpreter frame. The arguments for a method will be stored one after the other on the interpreter stack. Each argument will be stored at an offset on the stack aligned to 8 bytes (at least on an 64 bit arch). Every primitive type would fit in an 8 byte stack slot, while valuetypes could occupy a larger stack space. With every call, the interpreter stack pointer will be bumped to a new location where the arguments are already residing. This means that the responsability of a call instruction is to first resolve the actual method that needs to be called and then just initialize the state to prepare the execution of the new method (mainly initialize a new `InterpFrame`, set current stack pointer to the location of the arguments on the interp stack and the `ip` to the start of the IR opcode buffer).
+
+If we need to call a method that has its code compiled, then we would instead call some thunk that does the call convention translation, moving arguments from the interpreter stack to the native stack/regs, and which later dispatches to the compiled code.
+
+### Direct calls
+
+A direct calls is a call where the method to be called is known during method compilation time, for example for static or non-virtual calls. In this situation, when the call is compiled, an `InterpMethod` is allocated for the target method that needs to be called and this pointer is embedded into the generated interpreter code. No additional work is needed at execution, the `InterpMethod` is fetched from the opcode stream, for the first call the method will have to be compiled and then call dispatch continues as described above.
+
+In order to account for the scenario where the method to be called is aot compiled, when emitting code during compilation, we would first check if the method is present in an aot image. If that's the case, we would emit a different opcode instead and embed in the code the pointer to be called. During execution, this opcode would fetch the transition wrapper and proceed with the call.
-In order to account for the scenario where the method to be called is aot compiled, when emitting code during compilation, we would first check if the method is present in an aot image. If that's the case, we would emit a different opcode instead and embed in the code the pointer to be called. During execution, this opcode would fetch the transition wrapper and proceed with the call.
+In order to account for the scenario where the method to be called is AOT compiled, when emitting code during compilation, we would first check if the method is present in an AOT image. If that's the case, we would emit a different opcode instead and embed in the code the pointer to be called. During execution, this opcode would fetch the transition wrapper and proceed with the call.
-In order to account for the scenario where the method to be called is aot compiled, when emitting code during compilation, we would first check if the method is present in an aot image. If that's the case, we would emit a different opcode instead and embed in the code the pointer to be called. During execution, this opcode would fetch the transition wrapper and proceed with the call.
+In order to account for the scenario where the method to be called is AOT compiled, when emitting code during compilation, we would first check if the method is present in an AOT image. If that's the case, we would emit a different opcode instead and embed in the code the pointer to be called. During execution, this opcode would fetch the transition wrapper and proceed with the call.
+
+### Virtual/Interface calls
+
+When we need to do a virtual call, we would include the virtual `InterpMethod` in the call opcode but this needs to be resolved at runtime on the object that it gets called on. Calling into the runtime to resolve the target method is expected to be a slow process so an alternative is required.
+
+The Mono approach is based on the fact that each virtual method of a class has a designated slot in the vtable, that is constant on all classes that implement it. This means that the in the object header we can have a method table with each slot containing the target `InterpMethod` that needs to be called when calling through that slot. For virtual methods that have generic arguments, we could end up having the same slot occupied by multiple generic instantiations of the same method. Instead of having a single target method in this slot, we would have a linked list of key/value pairs, mapping the virtual method that needs to be called to the target method. When a call is made through this slot, we would iterate the list and if we don't find the method to call, we would call into the runtime to resolve the virtual method and then add it to this collection. For interface calls, we have a separate method table that has a fixed size. For each interface method we can compute a slot in this table, that is fixed across all types implementing this interface. Calling an interface method means calling through this slot where we can always have collisions with methods from other interfaces. We reuse the same mechanism of calling through a slot for virtual generic methods, with each slot holding a list of interface method / target method pairs, that are used to resolve the actual method that needs to be called.
+
+If we are to follow a similar approach, this would mean that in the `MethodTable` we would have at least an additional table where we would cache pairs of virtual method / target method. We could also have a one entry cache per call site, following on the idea of virtual stub dispatch. My current understanding is that if this call site cache fails, then falling back to calling into the runtime would be way too slow, so we would still need to have some sort of cache table in the `MethodTable` for the already resolved virtual calls on the type. If this is the case, then call site caching would be a feature that would provide limited benefit over `MethodTable` lookups, and can therefore be implemented later on if considered useful.
+
+During compilation, we have no way to tell whether a virtual method will resolve to a method that was already compiled or that needs to be interpreted. This means that once we resolve a virtual method we will have to do an additional check before actually executing the target method with the interpreter. We will have to look up for the method code in aot images (this lookup should happen only during the first invocation, a flag should be set on the `InterpMethod` once the lookup is completed). Following this check we would either continue with a normal intepreter call or dispatch via a transition wrapper. This would mean that we can have an `InterpMethod` structure allocated for an aot compiled method, but a flag would be set that this is aot-ed and we never attempt to execute it with the interpreter.
+
+### Indirect calls
+
+Indirect calls in the interpreter mean that we have no information about the method to be called at compile time, there is no `InterpMethod` descriptor embedded in the code. The method to be called would be loaded from the interpreter stack. If we would know for sure we are calling into the interpreter, then the solution would be trivial. Code that loads a function pointer would load an `InterpMethod` and indirect calls would just execute it in the same way as a normal call would. The problem arises from interoping with compiled code. Compiled code would rather express a function pointer as a callable native pointer whereas the interpreter would rather express it directly as an `InterpMethod` pointer.
+
+Based on the assumption that a method can either be compiled (aka present in AOT images) or interpreted, we could impose the condition that a function pointer for a compiled method is an aligned native function pointer and that a function pointer for an interpreted method is a tagged `InterpMethod`. The `InterpMethod` descriptor will need to contain a field for the `interp_entry` thunk that can be invoked by compiled code in order to begin executing this method with interpreter. Both representations would need to be handled by JIT and interpreter compilation engines.
+
+On the interpreter side of things, `ldftn` and `ldvirtftn` opcodes would resolve the method to be loaded and then look it up in the AOT images. If the method is found, then we would load the address of the methods code, otherwise we would load a tagged `InterpMethod` descriptor. `calli` would check if the function pointer is tagged or not. If it is tagged, it will untag it and proceed with the normal interpreter call invocation. If it is not tagged, then it will obtain the appropriate transition thunk (for on the signature embedded in the `calli` instruction) and it will call it passing the compiled code pointer together with the pointer to the arguments on the interpreter stack.
+
+In a similar fashion, on the JIT side of things, `ldftn` and `ldvirtftn` will either produce a callable function pointer or a tagged `InterpMethod` descriptor. `calli` would check if the function pointer is not tagged, in which case it would do a normal native call. If the function pointer is tagged, then it will instead do a normal native call through `InterpMethod->interp_entry`. This field will have to be initialized to an interp entry as part of the `ldftn`, but it could also be lazily initialized at the cost of another check.
+
+### PInvoke calls
+
+PInvoke calls can be made either by a normal `call` to a pinvoke method or by doing a `calli` with an unmanaged signature. In both cases the target function pointer that is available during code execution is a callable native function pointer. We must just obtain the transition wrapper and proceed to call it passing the native ftnptr.
+
+### Calls from compiled code
+
+Whenever compiled code needs to call a method, unless the call is direct, it will at some point query the runtime for the address of the method code. If at any point the runtime fails to resolve the method code, it should fallback to generating/obtaining a transition wrapper for the method in question. The caller will have no need to know whether it is calling compiled code or whether it is calling into the interpreter, the calling convention will be identical.
+
+### Delegate calls
+
+The runtime provides special support for creating/invoking delegates. A delegate invocation can end up either in compiled code or in the interpreter. On mono, the delegate object has jit specific fields and interpreter specific fields and the initialization is quite messy. Ideally, as a starting solution at least, a delegate would contain a function pointer and its  invocation would reuse the patterns for indirect calls via `calli` and would not require much special casing.
+