From 659dab2d80e0bd541961f08c80553cbb2c86819a Mon Sep 17 00:00:00 2001 From: Vlad Brezae Date: Thu, 5 Dec 2024 15:40:15 +0200 Subject: [PATCH 1/3] Interp doc for implementation of calls Included also initial document about runtime APIs used by the mono interpreter. --- docs/design/interpreter/calls.md | 46 +++ .../interpreter/mono-runtime-dependencies.md | 263 ++++++++++++++++++ 2 files changed, 309 insertions(+) create mode 100644 docs/design/interpreter/calls.md create mode 100644 docs/design/interpreter/mono-runtime-dependencies.md diff --git a/docs/design/interpreter/calls.md b/docs/design/interpreter/calls.md new file mode 100644 index 00000000000..e66fc09952f --- /dev/null +++ b/docs/design/interpreter/calls.md @@ -0,0 +1,46 @@ +### Interpreter call convention + +Within the interpreter, every single method will have a structure allocated that holds information about the managed method. Let's assume the name of this structure is `InterpMethod` (which is its name within the Mono runtime). This structure will contain important information that is relevant during execution (for example pointer to the interpreter IR code, the size of the stack that it uses and much more information). + +Interpreter opcodes operate on offsets from the base stack pointer of the current interpreter frame. The arguments for a method will be stored one after the other on the interpreter stack. Each argument will be stored at an offset on the stack aligned to 8 bytes (at least on an 64 bit arch). Every primitive type would fit in an 8 byte stack slot, while valuetypes could occupy a larger stack space. With every call, the interpreter stack pointer will be bumped to a new location where the arguments are already residing. This means that the responsability of a call instruction is to first resolve the actual method that needs to be called and then just initialize the state to prepare the execution of the new method (mainly initialize a new `InterpFrame`, set current stack pointer to the location of the arguments on the interp stack and the `ip` to the start of the IR opcode buffer). + +If we need to call a method that has its code compiled, then we would instead call some thunk that does the call convention translation, moving arguments from the interpreter stack to the native stack/regs, and which later dispatches to the compiled code. + +### Direct calls + +A direct calls is a call where the method to be called is known during method compilation time, for example for static or non-virtual calls. In this situation, when the call is compiled, an `InterpMethod` is allocated for the target method that needs to be called and this pointer is embedded into the generated interpreter code. No additional work is needed at execution, the `InterpMethod` is fetched from the opcode stream, for the first call the method will have to be compiled and then call dispatch continues as described above. + +In order to account for the scenario where the method to be called is aot compiled, when emitting code during compilation, we would first check if the method is present in an aot image. If that's the case, we would emit a different opcode instead and embed in the code the pointer to be called. During execution, this opcode would fetch the transition wrapper and proceed with the call. + +### Virtual/Interface calls + +When we need to do a virtual call, we would include the virtual `InterpMethod` in the call opcode but this needs to be resolved at runtime on the object that it gets called on. Calling into the runtime to resolve the target method is expected to be a slow process so an alternative is required. + +The Mono approach is based on the fact that each virtual method of a class has a designated slot in the vtable, that is constant on all classes that implement it. This means that the in the object header we can have a method table with each slot containing the target `InterpMethod` that needs to be called when calling through that slot. For virtual methods that have generic arguments, we could end up having the same slot occupied by multiple generic instantiations of the same method. Instead of having a single target method in this slot, we would have a linked list of key/value pairs, mapping the virtual method that needs to be called to the target method. When a call is made through this slot, we would iterate the list and if we don't find the method to call, we would call into the runtime to resolve the virtual method and then add it to this collection. For interface calls, we have a separate method table that has a fixed size. For each interface method we can compute a slot in this table, that is fixed across all types implementing this interface. Calling an interface method means calling through this slot where we can always have collisions with methods from other interfaces. We reuse the same mechanism of calling through a slot for virtual generic methods, with each slot holding a list of interface method / target method pairs, that are used to resolve the actual method that needs to be called. + +If we are to follow a similar approach, this would mean that in the `MethodTable` we would have at least an additional table where we would cache pairs of virtual method / target method. We could also have a one entry cache per call site, following on the idea of virtual stub dispatch. My current understanding is that if this call site cache fails, then falling back to calling into the runtime would be way too slow, so we would still need to have some sort of cache table in the `MethodTable` for the already resolved virtual calls on the type. If this is the case, then call site caching would be a feature that would provide limited benefit over `MethodTable` lookups, and can therefore be implemented later on if considered useful. + +During compilation, we have no way to tell whether a virtual method will resolve to a method that was already compiled or that needs to be interpreted. This means that once we resolve a virtual method we will have to do an additional check before actually executing the target method with the interpreter. We will have to look up for the method code in aot images (this lookup should happen only during the first invocation, a flag should be set on the `InterpMethod` once the lookup is completed). Following this check we would either continue with a normal intepreter call or dispatch via a transition wrapper. This would mean that we can have an `InterpMethod` structure allocated for an aot compiled method, but a flag would be set that this is aot-ed and we never attempt to execute it with the interpreter. + +### Indirect calls + +Indirect calls in the interpreter mean that we have no information about the method to be called at compile time, there is no `InterpMethod` descriptor embedded in the code. The method to be called would be loaded from the interpreter stack. If we would know for sure we are calling into the interpreter, then the solution would be trivial. Code that loads a function pointer would load an `InterpMethod` and indirect calls would just execute it in the same way as a normal call would. The problem arises from interoping with compiled code. Compiled code would rather express a function pointer as a callable native pointer whereas the interpreter would rather express it directly as an `InterpMethod` pointer. + +Based on the assumption that a method can either be compiled (aka present in AOT images) or interpreted, we could impose the condition that a function pointer for a compiled method is an aligned native function pointer and that a function pointer for an interpreted method is a tagged `InterpMethod`. The `InterpMethod` descriptor will need to contain a field for the `interp_entry` thunk that can be invoked by compiled code in order to begin executing this method with interpreter. Both representations would need to be handled by JIT and interpreter compilation engines. + +On the interpreter side of things, `ldftn` and `ldvirtftn` opcodes would resolve the method to be loaded and then look it up in the AOT images. If the method is found, then we would load the address of the methods code, otherwise we would load a tagged `InterpMethod` descriptor. `calli` would check if the function pointer is tagged or not. If it is tagged, it will untag it and proceed with the normal interpreter call invocation. If it is not tagged, then it will obtain the appropriate transition thunk (for on the signature embedded in the `calli` instruction) and it will call it passing the compiled code pointer together with the pointer to the arguments on the interpreter stack. + +In a similar fashion, on the JIT side of things, `ldftn` and `ldvirtftn` will either produce a callable function pointer or a tagged `InterpMethod` descriptor. `calli` would check if the function pointer is not tagged, in which case it would do a normal native call. If the function pointer is tagged, then it will instead do a normal native call through `InterpMethod->interp_entry`. This field will have to be initialized to an interp entry as part of the `ldftn`, but it could also be lazily initialized at the cost of another check. + +### PInvoke calls + +PInvoke calls can be made either by a normal `call` to a pinvoke method or by doing a `calli` with an unmanaged signature. In both cases the target function pointer that is available during code execution is a callable native function pointer. We must just obtain the transition wrapper and proceed to call it passing the native ftnptr. + +### Calls from compiled code + +Whenever compiled code needs to call a method, unless the call is direct, it will at some point query the runtime for the address of the method code. If at any point the runtime fails to resolve the method code, it should fallback to generating/obtaining a transition wrapper for the method in question. The caller will have no need to know whether it is calling compiled code or whether it is calling into the interpreter, the calling convention will be identical. + +### Delegate calls + +The runtime provides special support for creating/invoking delegates. A delegate invocation can end up either in compiled code or in the interpreter. On mono, the delegate object has jit specific fields and interpreter specific fields and the initialization is quite messy. Ideally, as a starting solution at least, a delegate would contain a function pointer and its invocation would reuse the patterns for indirect calls via `calli` and would not require much special casing. + diff --git a/docs/design/interpreter/mono-runtime-dependencies.md b/docs/design/interpreter/mono-runtime-dependencies.md new file mode 100644 index 00000000000..3edb40d6c1e --- /dev/null +++ b/docs/design/interpreter/mono-runtime-dependencies.md @@ -0,0 +1,263 @@ +API that is used during execution is underlined. It seems that quite a significant proportion can end up being used there. + +### Metadata structures with widespread use. Complete access to information contained by these, either directly or via accessors. + +1. **MonoMethod** + * MonoClass *klass + * MonoMethodSignature *signature + * MonoGenericContext *context + * flags + * etc + +1. **MonoClass** + * MonoClass *parent + * is_enum, is_vt etc + * rank, MonoClass *element_class + * instance_size, value_size, native_size, align + * MonoGenericContext *context + * MonoClassField *fields + * MonoType _byval_arg + * MonoImage *image + * MonoMethod **vtable + * has_references + * other flags + * etc + +1. **MonoMethodSignature** + * hasthis + * param_count + * MonoType *params + * MonoType *ret + * flags + * MonoClassField + * MonoType *type + * name + * offset + * flags + +1. **MonoType** + * type + * flags + * metadata + * obtainable size/alignment (mono_type_size) + +1. **MonoMethodHeader** + * code, code_size + * MonoExceptionClause *clause, num_clauses + * MonoType *locals, num_locals + * flags + +1. **MonoExceptionClause** + * type + * try_off, try_len + * handler_off, handler_len + * filter_offset + +1. **MonoTypedRef** + * type + * value + * klass + +### General functionality around metadata structures + +1. **Class initialization** + + | Method | Notes | + | -- | -- | + | mono_class_init_internal (MonoClass*) | basic class initialization | + | mono_runtime_class_init_full(MonoVTable*) | calls cctor for a class | + | mono_class_setup_fields (MonoClass*) | initializes mainly field offsets | + | mono_class_vtable_checked (MonoClass*) | get vtable for class | + | mono_class_setup_vtable (MonoClass*) | populates the class vtable | + +1. **Generics** + + | Method | Notes | + | -- | -- | + | mono_class_get_generic_class (MonoClass*) | for a generic instantiation, returns the generic class | + | mono_method_get_context (MonoMethod*) | get generic context if the method is inflated generic method| + | mono_method_get_generic_container (MonoMethod*) | get generic container for a generic method (not inflated) | + | mono_class_inflate_generic_method_checked (MonoMethod *method, MonoGenericContext*) | inflates a generic method | + | mono_inflate_generic_signature (MonoMethodSignature*, MonoGenericContext*) | inflates a generic signature from the context | + | mono_class_inflate_generic_type_checked (MonoType*, MonoGenericContext*) | inflates a generic MonoType | + +1. **Token Loading** + + | Method | Notes | + | -- | -- | + | mono_class_get_and_inflate_typespec_checked (MonoImage*, token, MonoGenericContext*) | get class from token | + | mono_method_get_signature_checked (MonoMethod, MonoImage, token, MonoGenericContext*) | seems to be only used for varag ? | + | mono_metadata_parse_signature_checked (MonoImage *, token) | decodes a MonoMethodSignature* from image | + | mini_get_class (MonoMethod*, token, MonoGenericContext*) | obtain MonoClass* from token used inside the method | + | mono_ldstr_checked (MonoImage*, token) | obtain MonoString from token | + | mono_ldtoken_checked (MonoImage*, token, MonoGenericContext*) | load token | + | mono_field_from_token_checked (MonoImage*, token, MonoClass*, MonoGenericContext*) | get field from token | + +1. **Additional class information** + + | Method | Notes | + | -- | -- | + | mono_class_is_subclass_of_internal (MonoClass *, MonoClass*) |  | + | mono_class_get_fields_internal (MonoClass*, gpointer iterator) | iterating over all fields of class | + | mono_class_get_cctor (MonoClass*) | gets cctor for class | + | mono_class_has_finalizer (MonoClass*) | whether class has finalizer, slow path allocation | + | mono_class_get_method_from_name_checked (MonoClass*, char *name, int num_params) | obtain a certain method from class | + +1. **Additional method information** + + | Method | Notes | + | -- | -- | + | mono_method_get_header_internal (MonoMethod*) | get MonoMethodHeader for a method | + | mono_custom_attrs_from_method_checked (MonoMethod*) | access attributes for method | + | mono_method_get_wrapper_data (MonoMethod *method, token) | wrapper methods hold some additional metadata | + | mono_method_get_imt_slot (MonoMethod *method) | get fixed slot for interface method | + | mono_method_get_vtable_slot (MonoMethod *method) | get the vtable slot for the method in its class | + | mono_get_method_constrained_with_method (MonoImage*, MonoMethod*, MonoClass* constrained, | MonoGenericContext *) | get target method when constraining it to be called on a certain type | + | m_method_get_mem_manager | allocator specific for a method, with memory released if method is collected | + +1. **Additional field information** + + | Method | Notes | + | -- | -- | + | mono_method_can_access_field (MonoMethod*, MonoClassField*) | | + | mono_class_field_is_special_static (MonoClassField*) | if field is thread static, used to be used to detect context static fields with remoting | + | mono_special_static_field_get_offset (MonoClassField*) | combining this offset with the tls data we obtain the acutal address of the field | + | mono_static_field_get_addr (MonoVTable*, MonoClassField*) | get the address of normal static field (which is referenced from the vtable) | + +1. **Misc** + + | Method | Notes | + | -- | -- | + | mono_class_create_array (MonoClass*, rank) | create a MonoClass* for an array of certain element type | + | mono_thread_internal_current () | get some runtime tls data (tls fields are for example obtained from a pointer in this structure) | + | get_default_jit_mm | general allocator | + | mono_defaults.corlib | MonoImage* corresponding to corlib | + | mono_defaults.string_class mono_defaults.*_class | MonoClass* pointers corresponding to various primitive/special types | + +### Information about IL opcodes + + | Method | Notes | + | -- | -- | + | mono_opcode_value (ip) | decodes IL opcode and returns an index for it | + | mono_opcode [index] | obtain information about each opcode (nr pushes, nr pops, type of opcode metadata) | + +### Basic object type format understanding + +1. **MonoObject** + * vtable + * sync + +1. **MonoArray** + * MonoObject + * length + * bounds + * vector + +1. **MonoString** + * MonoObject + * length + * chars + +1. **MonoDelegate** + * MonoObject + * MonoObject *target + * MonoMethod *target_method + * InterpMethod *interp_invoke (method interpreted when invoking delegate) + * Note that handling delegates / function pointers between interpreted and compiled code is a topic by itself and this area might suffer significant modifications + +### Object allocation + + | Method | Notes | + | -- | -- | + | mono_object_new (MonoClass *klass) | allocates new object | + | mono_array_new (MonoClass *klass, int len, ...) | allocated new array with provided element class | + | mono_string_new (char *cstr) | | + | mono_gc_alloc_obj (vtable, size) | lower level GC api for object allocation | + | mono_get_exception_.. | allocates certain exception objects to be thrown during execution | + | mono_nullable_box (gpointer vbuf, MonoClass *klass) | boxes a nullable valuetype | + | mono_type_get_object_checked (MonoType*) | get System.RuntimeType object for a MonoType | + +### GC + + | Method | Notes | + | -- | -- | + | mono_gchandle_new (MonoObject*) |  | + | mono_gchandle_free_internal (gchandle) |  | + | mono_gc_wbarrier_generic_store_internal (gpointer** ptr, MonoObject* val) | store val reference into ptr doing any necessary gc informing bout the store. | + | mono_value_copy_internal (gpointer dest, gpointer src, MonoClass* klass) | GC aware value copy | + | mono_threads_safepoint () | GC poll, suspension point | + | MONO_ENTER_GC_SAFE / MONO_EXIT_GC_SAFE | gc state transitions for coop suspend | + | mono_threads_attach_coop | attaches to the suspend machinery a thread that starts executing managed code | + | mono_threads_detach_coop |  | + +### EH + + | Method | Notes | + | -- | -- | + | mono_push_lmf / mono_pop_lmf () | push/pop thread local information that the runtime can use to unwind into the interpreter | + | mono_handle_exception (MonoContext *ctx, MonoObject *exc) | calls into the runtime to throw ex, with native register context serving as starting point of unwinding | + +### Wrappers + + | Method | Notes | + | -- | -- | + | mono_marshal_get_synchronized_wrapper | obtain MonoMethod for a synchronized wrapper calling the target method | + | mono_marshal_get_native_wrapper | obtain MonoMethod for a wrapper calling a pinvoke method | + | mono_marshal_get_native_func_wrapper | wrapper similar to the pinvoke for calling native function pointers | + | mono_marshal_get_managed_wrapper | wrapper for entering runtime from native code (UnmanagedCallersOnly) | + | mono_marshal_get_delegate_invoke | obtain wrapper for invoking multicast delegate | + | mono_marshal_get_delegate_begin_invoke, mono_marshal_get_delegate_end_invoke |  | + | mono_marshal_get_icall_wrapper | wrapper for icall | + +### Casting/Inheritance + + | Method | Notes | + | -- | -- | + | mono_object_isinst (MonoObject*, MonoClass*) | check whether object is instance of class | + | mono_class_is_assignable_from_internal (MonoClass*, MonoClass*) |  | + | mono_class_has_parent_fast (MonoClass, MonoClass*) | simple version of the above, not handling interfaces etc | + | MONO_VTABLE_IMPLEMENTS_INTERFACE (MonoVTable*, interface_id) | check whether an object with a certain vtable implements an interface | + +### Calls + + | Method | Notes | + | -- | -- | + | mono_class_interface_offset (MonoClass* k, MonoClass* iface) | returns a slot for interface in a class that implements the interface | + | mono_method_get_vtable_slot (MonoMethod*) | returns the vtable slot for a method | + | m_class_get_vtable (MonoClass*) | returns the vtable of a class | + | vtable->interp_vtable | returns the separate vtable where interpreter holds its method pointers | + | vtable->"interface_table" | another separate table for interpreter method pointers, in mono this is stored at negative offset from the vtable structure | + +### Native Interop + +Some assembly code will have to be present. It should be emitted during AOT compilation and loaded at runtime. + + | Method | Notes | + | -- | -- | + | mono_arch_get_interp_native_call_info (MonoMethodSignature*) | low level information about each arg location according to cconv | + | mono_create_ftnptr_arg_trampoline (interp_entry, InterpMethod*) | callable native thunk that embedds data so it can enter the interpreter when called | + | mini_get_gsharedvt_out_sig_wrapper (MonoMethodSignature*) | wrapper for transition to jit | + | mini_get_interp_in_wrapper (MonoMethodSignature*) | wrapper for entering interpreter from jit code | + | mono_jit_compile_method_jit_only (MonoMethod *method) | for interop with jit, can also look into aot | + | mono_aot_get_method (MonoMethod *method) | search for method in aot images | + | mono_find_jit_icall_info (token) | returns information for internal call token. This information will also contain the native pointer to be called | + +### Debugger + +It is unclear how relevant this is since the debugging story will probably be a separate topic by itself. + +| Method | Notes | +| -- | -- | +| mono_component_debugger ()->user_break | called by `System.Diagnostics.Debugger:Break` | +| mini_get_breakpoint_trampoline () | thunk for calling into debugger for breakpoint | +| mini_get_single_step_trampoline () | thunk for calling into debugger for single step | +| mono_debug_add_method (MonoMethod*, debug_info) | add some debug information associated with the compiled code for method, that the runtime needs | +| mono_debug_lookup_method (MonoMethod*) | get some debug information for method | + +### Profiler + +There are several places calling mono_profiler_raise_* methods in order to raise different profiler events. + +### Metadata Update + +There are various mono_metadata_update_* methods for supporting this feature. Doesn't seem relevant for the prototype at this stage. From ef035e7e91ffb339a9bba32287646fdfaf0440da Mon Sep 17 00:00:00 2001 From: Vlad Brezae Date: Fri, 13 Dec 2024 11:46:01 +0200 Subject: [PATCH 2/3] update to virtual dispatch implementation --- docs/design/interpreter/calls.md | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/docs/design/interpreter/calls.md b/docs/design/interpreter/calls.md index e66fc09952f..8d37fc34469 100644 --- a/docs/design/interpreter/calls.md +++ b/docs/design/interpreter/calls.md @@ -10,17 +10,15 @@ If we need to call a method that has its code compiled, then we would instead ca A direct calls is a call where the method to be called is known during method compilation time, for example for static or non-virtual calls. In this situation, when the call is compiled, an `InterpMethod` is allocated for the target method that needs to be called and this pointer is embedded into the generated interpreter code. No additional work is needed at execution, the `InterpMethod` is fetched from the opcode stream, for the first call the method will have to be compiled and then call dispatch continues as described above. -In order to account for the scenario where the method to be called is aot compiled, when emitting code during compilation, we would first check if the method is present in an aot image. If that's the case, we would emit a different opcode instead and embed in the code the pointer to be called. During execution, this opcode would fetch the transition wrapper and proceed with the call. +In order to account for the scenario where the method to be called is AOT compiled, when emitting code during compilation, we would first check if the method is present in an AOT image. If that's the case, we would emit a different opcode instead and embed in the code the pointer to be called. During execution, this opcode would fetch the transition wrapper and proceed with the call. ### Virtual/Interface calls When we need to do a virtual call, we would include the virtual `InterpMethod` in the call opcode but this needs to be resolved at runtime on the object that it gets called on. Calling into the runtime to resolve the target method is expected to be a slow process so an alternative is required. -The Mono approach is based on the fact that each virtual method of a class has a designated slot in the vtable, that is constant on all classes that implement it. This means that the in the object header we can have a method table with each slot containing the target `InterpMethod` that needs to be called when calling through that slot. For virtual methods that have generic arguments, we could end up having the same slot occupied by multiple generic instantiations of the same method. Instead of having a single target method in this slot, we would have a linked list of key/value pairs, mapping the virtual method that needs to be called to the target method. When a call is made through this slot, we would iterate the list and if we don't find the method to call, we would call into the runtime to resolve the virtual method and then add it to this collection. For interface calls, we have a separate method table that has a fixed size. For each interface method we can compute a slot in this table, that is fixed across all types implementing this interface. Calling an interface method means calling through this slot where we can always have collisions with methods from other interfaces. We reuse the same mechanism of calling through a slot for virtual generic methods, with each slot holding a list of interface method / target method pairs, that are used to resolve the actual method that needs to be called. +The Mono approach is based on the fact that each virtual method of a class has a designated slot in the vtable, that is constant on all classes that implement it. This means that in the object header we can have a method table with each slot containing the target `InterpMethod` that needs to be called when calling through that slot. For virtual methods that have generic arguments, we could end up having the same slot occupied by multiple generic instantiations of the same method. Instead of having a single target method in this slot, we would have a linked list of key/value pairs, mapping the virtual method that needs to be called to the target method. When a call is made through this slot, we would iterate the list and if we don't find the method to call, we would call into the runtime to resolve the virtual method and then add it to this collection. For interface calls, we have a separate method table that has a fixed size. For each interface method we can compute a slot in this table, that is fixed across all types implementing this interface. Calling an interface method means calling through this slot where we can always have collisions with methods from other interfaces. We reuse the same mechanism of calling through a slot for virtual generic methods, with each slot holding a list of interface method / target method pairs, that are used to resolve the actual method that needs to be called. -If we are to follow a similar approach, this would mean that in the `MethodTable` we would have at least an additional table where we would cache pairs of virtual method / target method. We could also have a one entry cache per call site, following on the idea of virtual stub dispatch. My current understanding is that if this call site cache fails, then falling back to calling into the runtime would be way too slow, so we would still need to have some sort of cache table in the `MethodTable` for the already resolved virtual calls on the type. If this is the case, then call site caching would be a feature that would provide limited benefit over `MethodTable` lookups, and can therefore be implemented later on if considered useful. - -During compilation, we have no way to tell whether a virtual method will resolve to a method that was already compiled or that needs to be interpreted. This means that once we resolve a virtual method we will have to do an additional check before actually executing the target method with the interpreter. We will have to look up for the method code in aot images (this lookup should happen only during the first invocation, a flag should be set on the `InterpMethod` once the lookup is completed). Following this check we would either continue with a normal intepreter call or dispatch via a transition wrapper. This would mean that we can have an `InterpMethod` structure allocated for an aot compiled method, but a flag would be set that this is aot-ed and we never attempt to execute it with the interpreter. +The Mono approach is quite different from the current CoreCLR approach that is based on the idea of virtual stub dispatch, which is also very efficient with common monomorphic calls. Ideally, for virtual calls we would reuse the same implementation that the JIT uses, where some small pieces of code resolve the method to be called based on various caches. In order to reuse the implementation, these lookups will have to produce instead either a callable function pointer or a fat pointer. Virtual calls from both compiled code as well as from the interpreter will have to check whether this pointer is tagged or not and correctly dispatch to the resolved method. This pointer will follow the same rules as for the ldftn/calli scenario that is described below. ### Indirect calls From 0b35624967af0424d049863db6fd54ca55669c7a Mon Sep 17 00:00:00 2001 From: Vlad Brezae Date: Fri, 13 Dec 2024 18:07:11 +0200 Subject: [PATCH 3/3] Specify needed wrappers for interpreter entry --- docs/design/interpreter/compiled-code-interop.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/docs/design/interpreter/compiled-code-interop.md b/docs/design/interpreter/compiled-code-interop.md index 3a2c5ceced8..0ce53cfee2f 100644 --- a/docs/design/interpreter/compiled-code-interop.md +++ b/docs/design/interpreter/compiled-code-interop.md @@ -1,6 +1,6 @@ ### Compiled code interop -The interpreter is expected to run in combination with aot compiled code so we will need an efficient mechanism for entering/exiting interpreter. Even in an interp only mode, we would still need to do these transitions for pinvokes and reverse pinvokes. These transitions also pose a few challenges since on iOS we won't be able to dynamically generate thunks, while on WASM the call signature must be embedded into the emitted Wasm code, meaning we can't reuse a generic thunk for calling methods with different signatures. +The interpreter is expected to run in combination with AOT compiled code so we will need an efficient mechanism for entering/exiting interpreter. Even in an interp only mode, we would still need to do these transitions for pinvokes and reverse pinvokes. These transitions also pose a few challenges since on iOS we won't be able to dynamically generate thunks, while on WASM the call signature must be embedded into the emitted Wasm code, meaning we can't reuse a generic thunk for calling methods with different signatures. The interpreter operates on a separate stack space that it maintains. Every local variable, including argument registers reside on this space. When an interpreter method starts executing, it expects the arguments to be present one after the other at the location pointed to by the stack pointer. This means that for a call exiting the interpreter, with a certain signature, we need to call a thunk that receives at least the location of the parameters and the target address and that moves each argument from the interpreter stack to the corresponding reg/native stack location followed by a native call to the target address. Once the call returns it should move the return value from the regs/stack to the interpreter stack. If we need to pass to native/compiled code a code pointer that can be used to entry the interpreter, we need to create a thunk that moves all arguments from the native regs/stack to a separate memory location. This thunk should also have embedded a pointer that identifies the interpreter method that we need to execute. The thunk should over pass the memory location where the arguments have been copied together with the interpreter method to execute, so the interpreter entry code can set up a new interpreter frame and begin executing the method code. @@ -12,7 +12,7 @@ Let's assume the interpreter needs to either do a pinvoke call or a call to a me 1. Specialized path - A specialized path means that we will have a thunk specialized per signature that can be used to call any method with that signature. The basic set of signatures that are needed by an application are easy to compute. We will require a separate thunk for every pinvoke signature as well as for every signature of a method that is aot-compiled, because each aot-compiled method can end up being called from the interpreter. While these thunks could be emitted directly in assembly code, it makes more sense to compile them as IL wrappers and emit them together with the rest of the managed code when aot compiling an assembly. Each one of these wrappers will receive as arguments the native pointer of the method to call and the address on the interpreter stack where the arguments reside. The wrapper will be able to compute the base address of every single argument and load the argument value. It will then proceed with executing the native call, followed by writing back the result of the call to the interpreter stack in place of the arguments. + A specialized path means that we will have a thunk specialized per signature that can be used to call any method with that signature. The basic set of signatures that are needed by an application are easy to compute. We will require a separate thunk for every pinvoke signature as well as for every signature of a method that is AOT-compiled, because each AOT-compiled method can end up being called from the interpreter. While these thunks could be emitted directly in assembly code, it makes more sense to compile them as IL wrappers and emit them together with the rest of the managed code when AOT compiling an assembly. Each one of these wrappers will receive as arguments the native pointer of the method to call and the address on the interpreter stack where the arguments reside. The wrapper will be able to compute the base address of every single argument and load the argument value. It will then proceed with executing the native call, followed by writing back the result of the call to the interpreter stack in place of the arguments. 1. Generic Path @@ -43,11 +43,11 @@ In scenarios where we need to pass a function pointer to a pinvoke from the inte 1. Specialized path - We will generate an IL wrapper for every signature that needs to be handled. The wrapper will receive the arguments of the call in the native registers/stack according to the native call convention plus the interpreter method pointer in a special register that the wrapper would need to be able to access via special IL. The wrapper should then be able to obtain the current interpreter stack pointer of the current thread from a TLS variable and then proceed to write every single argument to this location. Since it will be aot compiled, the call convention internals are easily handled by design, leaving the call convention details for the jit compiler to handle. The wrapper will then dispatch to an interpreter entry method, written in C++, that needs just to set up a new interpreter frame and then begin execution. As the method finishes execution, the return value will be at the top of the interpreter stack. The compiled wrapper will then load this value from the interpreter stack and return it normally. + We will generate an IL wrapper for every signature that needs to be handled. Given the wrapper handles transition from compiled code to interpreter code, this means the wrapper is needed for every call signature that can enter into the interpreter. These signatures are for `UnmanagedCallersOnly` methods that are not AOT-compiled and for every indirect call signature present in the AOT image, since any such call site could enter the interpreter. The wrapper will receive the arguments of the call in the native registers/stack according to the native call convention plus the interpreter method pointer in a special register that the wrapper would need to be able to access via special IL. The wrapper should then be able to obtain the current interpreter stack pointer of the current thread from a TLS variable and then proceed to write every single argument to this location. Since it will be AOT compiled, the call convention internals are easily handled by design, leaving the call convention details for the jit compiler to handle. The wrapper will then dispatch to an interpreter entry method, written in C++, that needs just to set up a new interpreter frame and then begin execution. As the method finishes execution, the return value will be at the top of the interpreter stack. The compiled wrapper will then load this value from the interpreter stack and return it normally. 1. Generic path - The generic path is very important for the interpreter entry scenario because this entry transition is required for methods that are interpreted, which by definition are methods that are not really known/handled at AOT compile time. This means that it would be common for us to not know the signature for the method entry, meaning we can't aot compile the necessary wrapper in advance. We could follow a similar approach to the interpreter exit generic path. + Given the common scenario of trying to AOT everything and have interpreter as a fallback for very few methods, it would seem wasteful to generate a wrapper for every call signature from the AOT code. A generic path would be essential in order to reduce size of compiled code images. If it turns out we can't statically detect some required signatures, then we will actually be forced to implement an alternative to the specialized wrappers. We could follow a similar approach to the interpreter exit generic path. The thunk embedding the interpreter method pointer would call instead into the transition interpreter thunks, passing, as before, the interpreter method pointer in a special register. The starting thunk will first store the pointer for the transition interpreter opcodes into a scratch register (which would be obtained from the interpreter method data), then it will obtain the interpreter stack pointer and start executing each instruction, moving values from the native regs/stack to the interpreter stack, according to the generated opcodes for the signature of the method. Once the arguments are moved it will call into C++ where actual method execution can start with the values on the interpreter stack. @@ -58,12 +58,12 @@ In addition to other Wasm limitations, the design might as well assume the impos ##### Interpreter exit -In order to support the pinvoke or compiled code call paths, we could use the same approach as with the native architecture. The only difference is that we won't be able to have a generic path, but for this transition it should rarely be problematic since we have a clear picture of the code that we would need to invoke into and signatures are typically reused. When the application is aot compiled, we will include a compiled wrapper for every signature of a compiled method as well as for every pinvoke signature. The wrapper will receive the target pointer to call and the address of the interpreter stack where the arguments are present. On mono these wrappers are written in C with dynamically generated code during app compilation time. I think it makes more sense to include them as compiled IL wrappers which should allow for code reuse with the native architecture approach and also for the invocation path to be as fast as possible. +In order to support the pinvoke or compiled code call paths, we could use the same approach as with the native architecture. The only difference is that we won't be able to have a generic path, but for this transition it should rarely be problematic since we have a clear picture of the code that we would need to invoke into and signatures are typically reused. When the application is AOT compiled, we will include a compiled wrapper for every signature of a compiled method as well as for every pinvoke signature. The wrapper will receive the target pointer to call and the address of the interpreter stack where the arguments are present. On mono these wrappers are written in C with dynamically generated code during app compilation time. I think it makes more sense to include them as compiled IL wrappers which should allow for code reuse with the native architecture approach and also for the invocation path to be as fast as possible. ##### Interpreter entry -Given we can't dynamically generate thunks that can be invoked, in order to generate an interpreter entry point we could reuse the functionality of fat pointers that is already used with native aot. Fat pointers can point to additional data, if a bit is set, rather than being actual function pointers. Instead of calling the pointer directly, the calling code will check for the most significant bit. If it is set, it will instead dereference the pointer and obtain the real function pointer together with the additional argument that is passed to the call. For the purpose of entering the interpreter, we would generate a fat pointer that has the target destination as a compiled IL wrapper for the signature in question together with the interpreter method pointer that is passed. The wrapper will obtain the pointer to the interpreter stack, will move all arguments there and call into the C++ interpreter path, passing the method and the address on the stack where the arguments have been written. When aot compiling an assembly, we would need to consider every single call as a potential entry to the interpreter and, if we deem it possible, we would generate an interpreter entry wrapper for the call signature in question. +Given we can't dynamically generate thunks that can be invoked, in order to generate an interpreter entry point we could reuse the functionality of fat pointers that is already used with native AOT. Fat pointers can point to additional data, if a bit is set, rather than being actual function pointers. Instead of calling the pointer directly, the calling code will check for the most significant bit. If it is set, it will instead dereference the pointer and obtain the real function pointer together with the additional argument that is passed to the call. For the purpose of entering the interpreter, we would generate a fat pointer that has the target destination as a compiled IL wrapper for the signature in question together with the interpreter method pointer that is passed. The wrapper will obtain the pointer to the interpreter stack, will move all arguments there and call into the C++ interpreter path, passing the method and the address on the stack where the arguments have been written. When AOT compiling an assembly, we would need to consider every single call as a potential entry to the interpreter and, if we deem it possible, we would generate an interpreter entry wrapper for the call signature in question. -Native code has no knowledge of fat pointers, so we will need to explicitly generate small thunks for every single `UnmanagedCallersOnly` method. On mono, during app compilation, a build task scans the assembly for all `UnmanagedCallersOnly` methods and it dynamically generates a separate C method for each one of them. This method will have its own data that is later initialized with the function pointer for the cconv translation wrapper together with the interpreter method pointer argument. Rather than have this logic in special build tasks that dynamically generate C code, it might make more sense to simply generate a special direct call wrapper for `UnmanagedCallersOnly` methods, that are not being aot compiled to the Wasm image (this might represent a scenario that is just good to have, for interp-only, but not really mandatory since we can choose to always aot compile these methods). +Native code has no knowledge of fat pointers, so we will need to explicitly generate small thunks for every single `UnmanagedCallersOnly` method. On mono, during app compilation, a build task scans the assembly for all `UnmanagedCallersOnly` methods and it dynamically generates a separate C method for each one of them. This method will have its own data that is later initialized with the function pointer for the cconv translation wrapper together with the interpreter method pointer argument. Rather than have this logic in special build tasks that dynamically generate C code, it might make more sense to simply generate a special direct call wrapper for `UnmanagedCallersOnly` methods, that are not being AOT compiled to the Wasm image (this might represent a scenario that is just good to have, for interp-only, but not really mandatory since we can choose to always AOT compile these methods). Given on Wasm we might encounter situations where we don't have a necessary wrapper for a certain signature we would be in need for a fallback approach. While on browser we could rely on dynamic generation of Wasm code, on wasi we have no alternative. For the rare cases where users would run into such scenarios, I think a simple approach would be for the runtime to report the missing signature when crashing and instruct the user to specify these signatures into a separate file, that can then be consumed by the Wasm application build, so the additional wrappers are compiled.