Benchmark Suite for SpeedCode (Software Performance Engineering Education via Coding Of Didactic Exercises). This repository covers representative benchmarks in a wide range of application domains that can benefit from high performance parallel computing technology. The benchmarks are implemented using OpenMP and OpenCilk for shared-memory multicore CPUs, and CUDA for GPUs. This is useful if you are interested in comparing different aspects of each parallel language's approach to, and ability to facilitate, parallel programming, including: ease of use, compile time, performance and efficiency, fine-grained control, and proneness to bugs such as deadlocks and race conditions. It is also useful for evaluating newly designed and implemented CPU, GPU, accelerator, compiler, or operating system.
Edit env.sh
to let the libraries pointing to the right paths in your system, and then:
$ source env.sh
Then just make in the root directory:
$ make
Or go to each sub-directory, e.g. src/tc, and then make:
$ cd src/tc; make
Binaries will be in the bin
directory.
For example, tc_omp_base
is the OpenMP version of triangle counting on CPU, tc_gpu_base
is the GPU version.
To run, go to each sub-directory, and then:
$ ./run-test.sh
To find out commandline format by running executable without argument:
$ cd ../../bin
$ ./tc_omp_base
Run triangle counting with an undirected toy graph on CPU:
$ ./tc_omp_base ../inputs/citeseer/graph
More graph datasets are available here. You can find the expected outputs in the README of each benchmark see here for triangle.
To control the number of threads, set the following environment variable:
$ export OMP_NUM_THREADS=[ number of cores in system ]
- Single-Precision A X Plus Y (SAXPY)
- Single Precision General Matrix Multiplication (SGEMM)
- Sparse Matrix-Vector Multiplication (SpMV)
- Sparse Matrix Dense Matrix Multiplication (SpMDM)
- Stencil
- Convolution (CONV)
- Prefix Sum a.k.a. Scan
- Histogram
- Reduction (Sum / Maximum Finding / MinMax)
- Merge Sort
- Radix Sort
- Jacobi method for solving linear systems
$Ax=b$ - Breadth-First Search (BFS) / Graph Traversal
- Fast Fourier transform (FFT) for digital signal processing
Serial | OpenMP | Cilk | CUDA | |
---|---|---|---|---|
SAXPY | ✔️ | ✔️ | ✔️ | ✔️ |
SGEMM | ✔️ | ✔️ | ✔️ | ✔️ |
SpMV | ✔️ | ✔️ | ✔️ | ✔️ |
SpMDM | ✔️ | ✔️ | ✔️ | ✔️ |
Stencil | ✔️ | ✔️ | ✔️ | ✔️ |
CONV | ✔️ | ✔️ | ✔️ | ✔️ |
Scan | ✔️ | ✔️ | ✔️ | ✔️ |
Histo | ✔️ | ✔️ | ✔️ | ✔️ |
Reduce | ✔️ | ✔️ | ✔️ | ✔️ |
Merge | ✔️ | ✔️ | ✔️ | ✔️ |
Radix | ✔️ | ✔️ | ✔️ | ✔️ |
Jacobi | ✔️ | ✔️ | ✔️ | ✔️ |
BFS | ✔️ | ✔️ | ✔️ | ✔️ |
FFT | ✔️ | ✔️ |
- Advanced Encryption Standard (AES), a specification for the encryption of electronic data
- B+ tree (B+T) used in file systems and database systems
- Barnes-Hut (BH) for N-Body simulation
- Black-Scholes (BS), a differential equation to price options contracts in finance
- CRC64 checksum (CRC), an error-detecting code used in digital networks and storage devices
- Jaccard index (JI), a statistic used for gauging the similarity and diversity of sample sets
- Haversine Distance (HD) for Geospatial Data Analysis
- Symmetric Gauss-seidel Smoother (SymGS) for numerical linear algebra
- Inversek2j (IK2J) Inverse kinematics for 2-joint arm used in Robotics
- Lattice Boltzmann methods (LBM) for Computational Fluid Dynamics (CFD)
- Collaborative Filtering (CF), a Stochastic Gradient Descent (SGD) algorithm for recommender systems
- K-means clustering (K-MEANS), a method of vector quantization for signal processing
- Ray Tracing (RT) for 3D computer graphics
- k-nearest neighbor (k-NN), a supervised learning method for classification and regression
- Locality sensitive hashing (LSH) for finding approximate nearest neighbors
- StreamCluster (SC) for online clustering of an input stream
- DAPHNE points2image (P2I) for Automotive
- PageRank (PR) for ranking web pages in a search engine
- Triangle Counting (TC) for social network analysis
- Betweenness Centrality (BC), a measure of centrality in a graph (from graph theory)
- Connected Components (CC) (from graph theory)
- Single-Source Shortest Paths (SSSP) finding the shortest paths (from graph theory)
- Minimum Spanning Tree (MST) (from graph theory)
- Vertex Coloring (VC) (from graph theory)
Serial | OpenMP | Cilk | CUDA | |
---|---|---|---|---|
BH | ✔️ | ✔️ | ✔️ | |
BS | ✔️ | ✔️ | ✔️ | |
LBM | ✔️ | ✔️ | ✔️ | |
CF | ✔️ | ✔️ | ✔️ | ✔️ |
PR | ✔️ | ✔️ | ✔️ | ✔️ |
TC | ✔️ | ✔️ | ✔️ | ✔️ |
CC | ✔️ | ✔️ | ✔️ | ✔️ |
BC | ✔️ | ✔️ | ✔️ | |
SSSP | ✔️ | ✔️ | ✔️ | |
MST | ✔️ | ✔️ | ✔️ | |
VC | ✔️ | ✔️ | ✔️ |
- Parboil https://github.com/abduld/Parboil
- Rodinia https://github.com/yuhc/gpu-rodinia
- PARSEC https://github.com/bamos/parsec-benchmark
- SPLASH-2https://github.com/staceyson/splash2
- GAPBS https://github.com/sbeamer/gapbs
- Seven Dwarfs https://www2.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.pdf
- PBBS https://cmuparlay.github.io/pbbsbench/benchmarks/index.html
- GBBS https://paralg.github.io/gbbs/docs/introduction
- HeCBench https://github.com/zjin-lcf/HeCBench
- GARDENIA https://github.com/chenxuhao/gardenia
- GraphAIBench https://github.com/chenxuhao/GraphAIBench
- AxBench http://axbench.org/
- DAPHNE https://github.com/esa-tu-darmstadt/daphne-benchmark.git
- Lonestar https://iss.oden.utexas.edu/?p=projects/galois/lonestar
- Dense Linear Algebra, Matrix multiply (e.g., SGEMM)
- Sparse Linear Algebra, (e.g., SpMV / SpMM)
- Spectral Methods, (e.g., FFT)
- N-Body Methods, (e.g., Barnes-Hut)
- Structured Grids, (e.g., LBM)
- Unstructured Grids
- Monte Carlo
- Combinational Logic (e.g., encryption)
- Graph traversal (e.g., Quicksort)
- Dynamic Programming
- Backtrack and Branch+Bound
- Construct Graphical Models
- Finite State Machine
- Bioinformatics (e.g., all-pairs-distance)
- Computer vision and image processing (e.g., Stencil, Convolution, aobench, sad, sobel, mriQ)
- Cryptography (e.g., AES)
- Data compression and reduction (e.g., Scan, bitpacking, histogram)
- Data encoding, decoding, or verification (e.g., md5hash, crc64)
- Finance (e.g., black-scholes)
- Geographic information system (e.g., haversine)
- Graph and Tree (e.g., BC, CC, TC, VC, MST, MIS, SSSP)
- Language and kernel features (e.g., wordcount, saxpy)
- Machine learning (e.g., CF, backprop, attention, kmeans, knn, page-rank, streamcluster, word2vec)
- Math (e.g., sgemm, spmv, symgs, jaccard, jacobi, leukocyte, lud, tridiagonal solver, AMG, matrix-rotate)
- Random number generation (e.g., rng-wallace, sobol)
- Search (e.g., binary search, b+tree, BFS)
- Signal processing (e.g., FFT)
- Simulation (e.g., nbody, LBM, CFD, Delaunay Mesh Refinement, hotspot3D, heartwall, laplace3d, lavaMD, particlefilter, pathfinder, pns, tpacf, bspline-vgh, burger, minisweep, miniWeather, sph, testSNAP)
- Sorting (e.g., quicksort, radixsort, mergesort, bitonic-sort)
- Robotics (e.g., inversek2j)
- Automotive (e.g., daphne)