-
Notifications
You must be signed in to change notification settings - Fork 43
Extended multiply horizontal add instruction #382
base: main
Are you sure you want to change the base?
Conversation
SSSE3 lowering mismatch others. |
You're totally right. Nice catch. |
@Maratyszcza I think that fixes it... I was stunned when I did the testing about how challenging pmaddusbw was to work with. It treats each operand differently such that unless both operands are guaranteed to be 127 or less, the ordering matters and the results will differ. |
By analogy with #127, this instruction should be named |
Haven't forgotten about this. Will take care of it today. |
This proposal is efficient on ARM64, but isn't efficient on x64. The original objective was to see if |
Introduction
This proposal introduces an extended horizontal multiply and add instruction that is used extensively in colorspace conversion and in the implementation of encoders and decoder for video processing. It mirrors the proposal @Maratyszcza put forth in #127 by adding an additional instruction for u8 -> i16 conversion. It maps to 3 instructions on ARM64, and 4 on ARMv7-a+neon. It's extremely similar to pmaddusbw that is supported on the Intel chipset, except that it's not signed by unsigned multiplication. This provides unsigned by unsigned multiplication.
Applications
Mapping to Common Instruction Sets
This section illustrates how the new WebAssembly instructions can be lowered on common instruction sets. However, these patterns are provided only for convenience, compliant WebAssembly implementations do not have to follow the same code generation patterns.
x86/x86-64 processors with AVX instruction set
x86/x86-64 processors with SSE2 instruction set
ARM64 processors
ARMv7 processors with NEON instruction set