This repo is the plugin backend of triton, just like amd/nvidia backend,
to support compile and execute triton kernels on Verisilicon's device.
- Before install the plugin, you need install llvm tools at first. Recommend the llvm-17 version.
- Triton use torch tensor as input, and this backend take "vsi" tensor only, make sure pytorch vpex plugin installed.
- Install ZenCompiler as the triton IR compiler.
- And the plugin use cnpy as the numpy file reader/writer in C/C++ code, if you need debug the dumped standalone code, install cnpy at first.
Assume that you've install python triton package in your environment, you can just install the plugin by create a soft link.
cd <your_env/lib/python3.x/site-packages/triton/backends>
ln -s <this_repo_root/backend> vsi
- VSI_DRIVER_PATH: the path to verisilicon sdk drivers
- TC_TOOLKITS_PATH: the path to verisilicon TensorCore toolkits
- ZEN_COMPILER_PATH: the path to zen_compiler executable
- CC: the C/C++ compiler used of backend, need use clang++ in vsi backend
import torch
import triton
import vpex
from triton.backends.vsi.driver import VSIDriver
triton.runtime.driver.set_active(VSIDriver())
# ... your triton kernel and other code
For single triton kernel, You can dump the triton_launcher C/C++ code as a compile-able standalone source code.
Set the meta parameter dump_standalone=True
to dump the source code.
And do not specialize those kernel arguments equal to one like this:
@triton.jit(do_not_specialize=["arg0", "arg1", ...])
# ... your kernel
Compile standalone source like:
export LD_LIBRARY_PATH=<verisilicon_drivers_path>
clang++ -g <source_path> -O0 -fPIC -o <out_name> -l OpenCL -l cnpy -I <verisilicon_herders_path>