x
denotes a 64-byte span from the X register pool, accessed as a vector of lanes. The lanes are indexed by i
.
y
denotes a 64-byte span from the Y register pool, accessed as a vector of lanes. The lanes are indexed by j
(or by i
for vector operations).
z
denotes the entire set of 64x64-byte Z registers, with 2D indexing. When only one index variable is used, [_]
denotes that the other index comes from the instruction operand (typically a bitfield called "Z row" or "Z column").
f
denotes some function. f(x, y) = x * y
is usually one option for binary functions. fs(z) = z >> s
is usually one option for unary functions.
Some instructions can operate in multiple distinct modes. In these cases, the instruction name is followed by the relevant mode bits. When the mode field is a single bit #N, this is denoted as "(N=0)" or "(N=1)". When the mode field is multiple bits starting at bit #N, this is denoted as "(N=M)" or "(N≠M)" or "(N≤M)" or "(N≥M)".
Instruction | General theme | Notes |
---|---|---|
set |
Setup AMX state | Raises invalid instruction exception if already setup. All registers set to zero. |
clr |
Clear AMX state | All registers set to uninitialised, no longer need saving/restoring on context switch. |
Instruction | General theme | Optional special features |
---|---|---|
ldx |
x[i] = memory[i] |
Load pair |
ldy |
y[i] = memory[i] |
Load pair |
ldz ldzi |
z[_][i] = memory[i] |
Load pair, interleaved Z |
stx |
memory[i] = x[i] |
Store pair |
sty |
memory[i] = y[i] |
Store pair |
stz stzi |
memory[i] = z[_][i] |
Store pair, interleaved Z |
Instruction | General theme | Writemask | Optional special features |
---|---|---|---|
fma64 (63=0)fma32 (63=0)fma16 (63=0) |
z[j][i] += x[i] * y[j] |
7 bit X, 7 bit Y | X/Y/Z input disable |
fms64 (63=0)fms32 (63=0)fms16 (63=0) |
z[j][i] -= x[i] * y[j] |
7 bit X, 7 bit Y | X/Y/Z input disable |
matfp |
z[j][i] ±= f(x[i], y[j]) |
9 bit X, 9 bit Y | Indexed X or Y, shuffle X, shuffle Y, positive selection |
Instruction | General theme | Writemask | Optional special features |
---|---|---|---|
mac16 (63=0) |
z[j][i] += x[i] * y[j] |
7 bit X, 7 bit Y | X/Y/Z input disable, right shift |
matint (47≠4) |
z[j][i] ±= f(x[i], y[j]) |
9 bit X or Y | Indexed X or Y, shuffle X, shuffle Y, right shift, sqrdmlah , popcnt |
matint (47=4) |
z[j][i] = f(z[j][i]) |
9 bit X or Y | Right shift, saturation |
Instruction | General theme | Writemask | Optional special features |
---|---|---|---|
fma64 (63=1)fma32 (63=1)fma16 (63=1) |
z[_][i] += x[i] * y[i] |
7 bit | X/Y/Z input disable |
fms64 (63=1)fms32 (63=1)fms16 (63=1) |
z[_][i] -= x[i] * y[i] |
7 bit | X/Y/Z input disable |
vecfp |
z[_][i] ±= f(x[i], y[i]) |
9 bit | Indexed X or Y, shuffle X, shuffle Y, broadcast Y element, positive selection, min , max |
Instruction | General theme | Writemask | Optional special features |
---|---|---|---|
mac16 (63=1) |
z[_][i] += x[i] * y[i] |
7 bit | X/Y/Z input disable, right shift |
vecint (47≠4) |
z[_][i] ±= f(x[i], y[i]) |
9 bit | Indexed X or Y, shuffle X, shuffle Y, broadcast Y element, right shift, sqrdmlah |
vecint (47=4) |
z[_][i] = f(z[_][i]) |
9 bit | Right shift, saturation |
Instruction | General theme | Writemask | Optional special features |
---|---|---|---|
extrx |
x[i] = y[i] |
None | |
extry |
y[i] = x[i] |
None | |
extrh (26=0) |
x[i] = z[_][i] |
7 bit | |
extrh (26=1,10=0) |
x[i] = f(z[_][i]) |
9 bit | Integer right shift, integer saturation |
extrv (26=1,10=0) |
x[j] = f(z[j][_]) |
9 bit | Integer right shift, integer saturation |
extrv (26=0) |
y[j] = z[j][_] |
7 bit | |
extrv (26=1,10=1) |
y[j] = f(z[j][_]) |
9 bit | Integer right shift, integer saturation |
extrh (26=1,10=1) |
y[i] = f(z[_][i]) |
9 bit | Integer right shift, integer saturation |
Instruction | General theme | Notes |
---|---|---|
genlut (53≤6) |
Generate indices for indexed load | For use by matfp / matint / vecfp / vecint / genlut (53≥7) |
genlut (53≥7) |
Perform indexed load | Can write to any of x or y or z |