Measure the performance of individual cores with OpenMP.

You can use this program to measure the performance of individual CPU cores of a system. It works by running a pleasingly parallel workload on each core and measuring the time it takes to complete. The workload is a simple loop that find the sum of all the elements of an array (1 billion 64-bit floating point numbers by default). The program is written in C++ and uses OpenMP to parallelize the workload. To prevent the operating system from moving threads between cores, the program uses the OpenMP affinity API to pin each thread to a specific core.

Note

You can just copy main.sh to your system and run it.
For the code, refer to main.cxx.

$ ./main.sh
# OMP_NUM_THREADS=64
# {run=000, thread=000, node=0, core=000, time=2407.0969ms, flops=4.1544e+08}
# {run=000, thread=001, node=0, core=032, time=2407.0779ms, flops=4.1544e+08}
# {run=000, thread=002, node=1, core=001, time=2407.0649ms, flops=4.1544e+08}
# ...

I run this program on nodes 1, 2, and 4 of our cluster - no core-specific faults are present.

Also below is runtime stabilisation plot (I perform 100 runs of summing a billion elements on each core).

References

OpenMP topic: Affinity - Parallel Programming for Science Engineering: The Art of HPC, volume 2 by Victor Eijkhout
c - Openmp. How to retrieve the core id in which a thread is running - Stack Overflow
c++ - OpenMP and CPU affinity - Stack Overflow
c++ - Set CPU affinity when create a thread - Stack Overflow
Controlling OpenMP Thread Affinity
Thread Affinity Interface
OpenMP Application Program Interface Version 3.1
Embarrassingly parallel - Wikipedia

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

References

Files

README.md

Latest commit

History

README.md

File metadata and controls

References