Description
Blocked by: #1433
Currently Wasmi uses wasmi_ir
's Instruction
enum for its simple instruction dispatch:
wasmi/crates/wasmi/src/engine/executor/instrs.rs
Lines 134 to 135 in 1a31914
This is among the simplest solution for an interpreter execution implementation. IIRC this is also called "indirect token threading" since under the hood Wasmi dispatches on the enum
discriminant of the Instruction
enum which acts as the "token".
However, with an increasing number of instructions this indirect token threading dispatch reaches its limits concerning performance. One explanation for this is that having to maintain the table mapping from token -> instruction-handler
does no longer neatly fit in the CPU's L1 cache when there are many hundreds of instructions. Maybe this is also the reason why LLVM does not generate such a branch table and instead uses codegen for Wasmi's instruction dispatch that I genuinely have never succeeded to fully understand.
Wasmi currently has 550 (!simd
) and 800 (simd
) instructions and according to the theory described above it would benefits extremely from implementing a direct threading instruction dispatch. For this, instead of dispatching on the Instructions
's enum discriminant, the Wasmi translator instead translates Instruction
s without their discriminant and with a function pointer to their instruction handler. This eliminates having to dispatch upon the discriminant during instruction execution.
The most efficient Wasm interpreters such as makepad-stitch
and Wasm3 employ this dispatch technique to great success.
Another advantage of using direct threading dispatch in Wasmi is that it would allow to use 8-bytes of instruction parameters per instruction instead of just 6 bytes. This would enable to specialize some f32
and f64
instructions with immediate variants similar to their i32
and i64
counterparts. Similar improvements to instruction encoding could be made across the board. The major obvious downside is that Wasmi's IR (or bytecode) would use significantly (~1.5-2x) more memory since the baseline for instruction sizes is 16 bytes instead of just 8 bytes due to having an up to 64-bit function pointer instead of a 2-byte enum discriminant followed by a minimum 8 bytes of parameters instead of just 6 bytes.