FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores
FlashSparse is accepted by PPoPP 2025. See the Arxiv preprint version of the paper. FlashSparse significantly reduces the computation redundancy for unstructured sparsity (for SpMM and SDDMM) on Tensor Cores through a Swap-and-Transpose mapping strategy.
git clone --recursive https://github.com/ParCIS/FlashSparse.git
- Requirements:
Ubuntu 16.04+
cmake >= 3.29
CUDA >= 11.8
- one NVIDIA RTX4090 GPU and one H100 PCIe GPU.
Conda environments need to be set up on machines with H100 PCIe and RTX4090 GPUs following the steps below.
- 2.1.1 Install
conda
on system. (Toturial). - 2.1.2 Create a
conda
environment:
conda create -n env_name python=3.9
- 2.1.3 Install
PyTorch
(Toturial):
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
cd FlashSparse/
bash comple.sh
Get the preprocessed datasets (total 515 sparse matrices).
cd dataset/
python prepare.py
cd Baseline/
bash comple.sh
6.1 Install Deep Graph Library (DGL)
(Toturial):
pip install dgl -f https://data.dgl.ai/wheels/torch-2.1/cu118/repo.html
6.2 Install Pytorch-Geometric (PyG)
(Toturial):
pip install torch_geometric
- Go to project
eva/kernel/spmm/
directory.bash ./test_spmm_shell.sh
to run all SpMM experiments. (about 200 minutes)- Check the results in
result/FlashSparse/spmm/*.csv
.
- Go to project
eva/kernel/sddmm/
directory.bash ./test_sddmm_shell.sh
to run all SDDMM experiments. (about 100 minutes)- Check the results in
result/FlashSparse/sddmm/*.csv
.
- Go to project
eva/end2end/gcn/
directory.python eva_gcn_fs.py
to run GCN experiments.python eva_gcn_baseline.py
to run GCN experiments.- Check the results in
result/FlashSparse/gcn/fs_gcn_128.csv
. (about 5 minutes)- Check the results in
result/Baseline/agnn/baseline_gcn_128.csv
. (about 15 minutes)
- Go to project
eva/end2end/agnn/
directory.python eva_agnn_fs.py
to run AGNN experiments.python eva_agnn_baseline.py
to run AGNN experiments.- Check the results in
result/FlashSparse/agnn/fs_agnn_32.csv
. (about 5 minutes)- Check the results in
result/Baseline/agnn/baseline_agnn_32.csv
. (about 15 minutes)
- Go to project
Baseline/RoDe/script/
directory.bash download.sh
to download the same 515 matices in a specific format for RoDe. (optional)bash test_spmm_shell.sh
to run all SpMM experiments. (about 300 minutes)bash test_sddmm_shell.sh
to run all SDDMM experiments. (about 300 minutes)- Check the results in
result/Baseline/spmm/rode*.csv
andresult/Baseline/sddmm/rode*.csv
.
- Go to project
Baseline/DTC-SpMM/
directory.bash test_spmm_shell.sh
to run all SpMM experiments. (about 20 minutes)- Check the results in
result/Baseline/spmm/dtc*.csv
.
- Go to project
eva/kernel/spmm/
directory.bash test_spmm_shell_base.sh
to run all SpMM experiments. (about 100 minutes)- Check the results in
result/Baseline/spmm/base*.csv
.
- Go to project
eva/kernel/sddmm/
directory.bash test_sddmm_shell_base.sh
to run all SDDMM experiments. (about 20 minutes)- Check the results in
result/Baseline/sddmm/base*.csv
.
- Go to project
result/Baseline/spmm/
directory.python summarize.py
to summarize all results.
- Go to project
result/Baseline/sddmm/
directory.python summarize.py
to summarize all results.
- Go to project
eva/plot/kernel_spmm/
directory.python plot_figure11_ac.py
and check the figure infigure11.png
(The plotted figure11.png on H100 corresponds to Figure 11(a) in the paper, and on RTX4090 corresponds to Figure 11(c) in the paper.)python plot_figure11_bd.py
and check the figure infigure11_sub.png
. (The plotted figure11_sub.png on H100 corresponds to Figure 11(b) in the paper, and on RTX4090 corresponds to Figure 11(d) in the paper.)python profile_table5.py
and check the result intable5.txt
. (The profiled table5.txt on H100 corresponds to Table5(left) in the paper, and on RTX4090 corresponds to Table5(right) in the paper.)
- Go to project
eva/plot/kernel_sddmm/
directory.python plot_figure13_a.py
and check the figure infigure13(a).png
.python plot_figure13_b.py
and check the figure infigure13(b).png
. (The plotted figure13(a).png and figure13(b).png on H100 correspond to Figure 13(a)(b) in the paper, and on RTX4090 corresponds to Figure 13(c)(d) in the paper.)python profile_table6.py
and check the result intable6.txt
. (The profiled table6.txt on H100 corresponds to Table6(left) in the paper, and on RTX4090 corresponds to Table6(right) in the paper.)
- Go to project
eva/plot/ablation/memory/
directory.python spmm.py
and check the result inmemory_spmm.csv
. (about 20 minutes)python sddmm.py
and check the result inmemory_sddmm.csv
. (about 20 minutes)
python plot_spmm.py
and check the figure inspmm_mem.png
.python plot_sddmm.py
and check the figure insddmm_mem.png
.
- Go to project
eva/plot/ablation/throughput/
directory.python plot_spmm.py
and check the figure infigure14(a).png
.python plot_sddmm.py
and check the figure result infigure14(b).png
. (The plotted figure14(a).png and figure14(b).png on H100 correspond to Figure 14(a)(b) in the paper, and on RTX4090 corresponds to Figure 14(c)(d) in the paper.)
- Go to project
eva/plot/ablation/access/
directory.python plot.py
and check the figure infigure15.png
. (The plotted figure15.png on H100 correspond to Figure 15(left) in the paper, and on RTX4090 corresponds to Figure 15(right) in the paper.)
- Go to project
eva/plot/ablation/format/
directory.python format.py
and check the result inresult.csv
. (about 25 minutes)python profile.py
and check the output`.
- Go to project
eva/plot/gcn/
directory.python plot.py
and check the figure infigure16_gcn.png
.
- Go to project
eva/plot/agnn/
directory.python plot.py
and check the figure infigure16_agnn.png
.
- Go to project
eva/accuracy/gcn/
directory.python eva_gcn.py
. (about 1 minutes)- Check the result in
result/Baseline/gcn/accuracy.csv
.