Measure the performance of individual cores with OpenMP.
You can use this program to measure the performance of individual CPU cores of a system. It works by running a pleasingly parallel workload on each core and measuring the time it takes to complete. The workload is a simple loop that find the sum of all the elements of an array (1 billion 64-bit floating point numbers by default). The program is written in C++ and uses OpenMP to parallelize the workload. To prevent the operating system from moving threads between cores, the program uses the OpenMP affinity API to pin each thread to a specific core.
Note
You can just copy main.sh
to your system and run it.
For the code, refer to main.cxx
.
$ ./main.sh
# OMP_NUM_THREADS=64
# {run=000, thread=000, node=0, core=000, time=2407.0969ms, flops=4.1544e+08}
# {run=000, thread=001, node=0, core=032, time=2407.0779ms, flops=4.1544e+08}
# {run=000, thread=002, node=1, core=001, time=2407.0649ms, flops=4.1544e+08}
# ...
I run this program on nodes 1, 2, and 4 of our cluster - no core-specific faults are present.
Also below is runtime stabilisation plot (I perform 100 runs of summing a billion elements on each core).
- OpenMP topic: Affinity - Parallel Programming for Science Engineering: The Art of HPC, volume 2 by Victor Eijkhout
- c - Openmp. How to retrieve the core id in which a thread is running - Stack Overflow
- c++ - OpenMP and CPU affinity - Stack Overflow
- c++ - Set CPU affinity when create a thread - Stack Overflow
- Controlling OpenMP Thread Affinity
- Thread Affinity Interface
- OpenMP Application Program Interface Version 3.1
- Embarrassingly parallel - Wikipedia