-
Description:
Intel® Advanced Matrix Extensions (Intel® AMX) is a new 64-bit programming
paradigm consisting of two components: a set of 2-dimensional registers
(tiles) representing sub-arrays from a larger 2-dimensional memory image,
and an accelerator able to operate on tiles, the first implementation is
called TMUL(tile matrix multiply unit).
This test is for verifying XTILECFG and XTILEDATA state component by TMUL
instructions. In the test, multi-task can be launched to execute AMX/TMUL
calculation, the procedure of calculation maybe interrupted by different
reasons for several times, such as yield, sleep, trap, signal and futex.
When the task come back, the calculation will be moved on for getting result.
It will be checked if the result is correct or not. -
How to build:
gcc 11.1 or above is required for amx_bf16, amx_int8 test. gcc 13.3 or above is required for extra amx_fp16 test. Note: some OS distros has gcc12.3 support extra amx_fp16 test, you may confirm gcc amx_fp16 support or not by command: echo 'int main() { asm volatile("tdpfp16ps %tmm2, %tmm1, %tmm0"); return 0; }' | gcc -x c -o /dev/null -
To compile, for amx_bf16, amx_int8 test:
$ make for extra amx_fp16 test: $ make fp16To clean,
$ make clean -
How to run:
a. show command usage
$ ./tmul --helpb. Break sub-thread which is doing TMUL TDPBF16PS calculation by yield
$ ./tmul -b 1 -t 10 -c 20 -i 0c. Break sub-thread which is doing TMUL TDPBSSD calculation by sleep
$ ./tmul -b 2 -t 10 -c 20 -i 1d. Break sub-thread which is doing TMUL TDPBSUD calculation by trap
$ ./tmul -b 3 -t 10 -c 20 -i 2e. Break sub-thread which is doing TMUL TDPBUSD calculation by signal
$ ./tmul -b 4 -t 1000 -c 1000 -i 3
Notes: Signal is generated by main thread and handled by sub thread,
so need to run multi cycles to ensure sub-thread is interrupted when
TMUL calculation is being done.f. Break sub-thread which is doing TMUL TDPBUUD calculation by futex
$ ./tmul -b 5 -t 10 -c 20 -i 4g. Break sub-thread which is doing TMUL TDPFP16PS calculation by yield $ ./tmul -b 1 -t 10 -c 20 -i 5