HiHiSpMV is a Sparse Matrix with dense Vector multiplication (SpMV) accelerator for HBM equipped FPGAs. Its details could be found here.
- VITIS 22.2/XRT 2.14 and AMD/Xilinx xilinx_u280_gen3x16_xdma_1_202211_1 shell for a AMD/Xilinx U280 FPGA card.
- GCC & C++17 standard
- GNU Make
- Bash (for downloading and processing test matrices)
- Python3 (for downloading and processing test matrices)
ml reset
ml fpga
ml xilinx/xrt/2.14
For the followings, run the commands in the main HiHiSpMV directory.
make build_xilinx_spmv_host TARGET=<hw/hw_emu/sw_emu> O=<gcc-Olevel>
NOTE: Keep the
TARGET
matching between the host and the executed xclbin. If needed, rebuild the host with the appropriate matchingTARGET
before running it.
make build_xilinx_spmv_xclbin TARGET=<hw_emu/sw_emu> ID=<output-dir-Id>(default=1) CFGID=<link-config-file-Id>(default=1)
The ID
defines the suffix for intermediate and final outputs in the HiHiSpmv/bin/
directory, and the CFGID
defines the identifier of the linker configuration file (in HiHiSpMV/src/kernels/
directory) used in Vitis.
NOTE: Various paramters could be adjusted in the Kernel-Config (
HiHiSpMV/src/kernels/csr_spmv_repl.cfg
) Link-Config (HiHiSpMV/src/kernels/single.1.cfg
), XRT.ini (HiHiSpMV/xrt.ini
) and Definitions (HiHiSpmv/src/kernels/xlx_definitions.hpp
) files. Their adjustable settings are listed below.
NOTE: For avoiding synthesizing the Bitsteam/Hardware, already synthesized Bitstream can be downloaded from the Zenedo upload.
make build_xilinx_spmv_xclbin TARGET=hw ID=<output-dir-Id>(default=1) CFGID=<link-config-file-Id>(default=1)
The ID
and CFGID
are as defined above.
NOTE: Various paramters could be adjusted in the Kernel-Config (
HiHiSpMV/src/kernels/csr_spmv_repl.cfg
) Link-Config (HiHiSpMV/src/kernels/single.1.cfg
), XRT.ini (HiHiSpMV/xrt.ini
) and Definitions (HiHiSpmv/src/kernels/xlx_definitions.hpp
) files. Their adjustable settings are listed below.
scripts/collect_data.sh
NOTE: Matrices could be added in MartixMarket format.
make test_xilinx_spmv TARGET=<hw_emu/sw_emu> ID=<output-dir-Id>(default=1)
NOTE: Make sure to match the
TARGET
between the host and xclbin. Further, various execution paramters could be adjusted in the main Makefile (HiHiSpmv/Makefile
).
make test_xilinx_spmv TARGET=hw ID=<output-dir-Id>(default=1)
NOTE: Make sure to match the
TARGET
between the host and xclbin and on the first time configuring the FPGA, a few seconds of delay is expected. Further, various execution paramters could be adjusted in the main Makefile (HiHiSpmv/Makefile
).
NOTE: The Zenedo upload could be downloaded and the Bitstream from it could be used in order to avoid synthesizing it by the following:
wget https://zenodo.org/records/12700052/files/HiHiSpMV-v0.0.1.zip
unzip HiHiSpMV-v0.0.1.zip
cp -r HiHiSpMV-v0.0.1/bin/ bin/
In the following the adjustable paramters in the main Makefile (HiHiSpmv/Makefile
), Kernel-Config (HiHiSpMV/src/kernels/csr_spmv_repl.cfg
) Link-Config (HiHiSpMV/src/kernels/single.1.cfg
), XRT.ini (HiHiSpMV/xrt.ini
) and Definitions (HiHiSpmv/src/kernels/xlx_definitions.hpp
) files, are listed.
NOTE: All of the following Makefile parameters apply to executing the host only.
HW_SIZE
: The maximum side-length of the square tile. Should be matching with theVECTOR_SIZE
in the Definitions file.XLX_DEVICE_ID
: The device Id. one which the Bitstream will be loaded onto.XLX_ITERS
: The number of iterations per launch of the CUs.XLX_RUNS
: The number of times the CUs are launched.
VECTOR_SIZE
: Defines the maximum side-length of the square tile. Could be adjusted according to the available BRAM blocks.DEBUG <0-3>
: Applicable insw_emu
only to log the operations inside each CU.- Trip-count constants: Used for latency reports generatione e.g
*_min
and*_max
.
3. Link-Config
- HBM assignments: Assigns each CU's HBM bank connections e.g.
sp=csr_spmv_repl_1_1.indices:HBM[0]
. Should be matching with the number of CUs to be used. - Stream connections: Defines intra-CU AXI-stream connections e.g.
sc=csr_spmv_repl_1_4.out_indices:csr_spmv_repl_2_4.in_indices:64
. Should be matching with the number of CUs to be used. - SLR assignment: Assigns each CU to a specific SLR e.g.
slr=csr_spmv_repl_1_1:SLR2
. Should be matching with the number of CUs to be used. - Number of kernels: Defines number of CUs for each kernel e.g.
nk=csr_spmv_repl_1:16
.
4. XRT.ini
Should use legacy scheduler by specifying the following:
[Runtime]
ert=false
kds=flase
In order to run all 16 CUs, i.e. more than 32 kernels.
@inproceedings{ta-me-24a,
Author = {Abdul Rehman Tareen and Marius Meyer and Christian Plessl and Tobias Kenter},
Title = {{HiHiSpMV}: Sparse Matrix Vector Multiplication with Hierarchical Row Reductions on {FPGAs} with High Bandwidth Memory},
Booktitle = {Proc. IEEE Symp. on Field-Programmable Custom Computing Machines (FCCM)},
Year = {2024},
Note = {To appear.}}