-
Notifications
You must be signed in to change notification settings - Fork 38
Fix printing of max threads used by OpenMP #293
Conversation
- Replace omp_get_num_threads() with opm_get_max_threads() to get the max number of threads that can OpenMP use
@@ -125,7 +125,7 @@ void nrnmpi_init(int nrnmpi_under_nrncontrol, int* pargc, char*** pargv) { | |||
if (nrnmpi_myid == 0) { | |||
#if defined(_OPENMP) | |||
printf(" num_mpi=%d\n num_omp_thread=%d\n\n", nrnmpi_numprocs_world, | |||
omp_get_num_threads()); | |||
omp_get_max_threads()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iomaganaris : see here.
We do want omp_get_num_threads
but it's needs to be called from parallel region and that's the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was actually also always using omp_get_num_threads
. But looking around, I don't see why @iomaganaris change is not actually a good one. This runtime call gives you the maximum available threads to OpenMP in the application, which is what you want to print here.
https://www.openmp.org/spec-html/5.0/openmpsu112.html
omp_get_num_threads
would tell you how many threads there are actually in a particular parallel region. This would return the wrong value in nested parallelism and depends also on omp_set_num_threads
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried running the following test:
#include <iostream>
#include <omp.h>
#include <unistd.h>
#define THREAD_NUM 4
int main()
{
if (!getenv("OMP_NUM_THREADS")) {
omp_set_num_threads(THREAD_NUM); // set number of threads in "parallel" blocks
}
std::cout << "Number of max available threads: " << omp_get_max_threads() << std::endl;
#pragma omp parallel //num_threads(2)
{
usleep(5000 * omp_get_thread_num()); // do this to avoid race condition while printing
std::cout << "Number of available threads: " << omp_get_num_threads() << std::endl;
// each thread can also get its own number
std::cout << "Current thread number: " << omp_get_thread_num() << std::endl;
std::cout << "Hello, World!" << std::endl;
#pragma omp parallel //num_threads(2)
{
usleep(1000 * omp_get_thread_num()); // do this to avoid race condition while printing
std::cout << "Number of available nested threads: " << omp_get_num_threads() << std::endl;
std::cout << "Nested thread number: " << omp_get_thread_num() << std::endl;
}
}
return 0;
}
And I got the following output:
-bash-4.2$ export OMP_NUM_THREADS=3
-bash-4.2$ ./openmp
Number of max available threads: 3
Number of available threads: 3
Current thread number: 0
Hello, World!
Number of available nested threads: 1
Nested thread number: 0
Number of available threads: 3
Current thread number: 1
Hello, World!
Number of available nested threads: 1
Nested thread number: 0
Number of available threads: 3
Current thread number: 2
Hello, World!
Number of available nested threads: 1
Nested thread number: 0
If I uncomment the num_threads(2)
in the omp directives
and set OMP_NESTED=TRUE
:
-bash-4.2$ ./openmp
Number of max available threads: 3
Number of available threads: 2
Current thread number: 0
Hello, World!
Number of available nested threads: 2
Nested thread number: 0
Number of available nested threads: 2
Nested thread number: 1
Number of available threads: 2
Current thread number: 1
Hello, World!
Number of available nested threads: 2
Nested thread number: 0
Number of available nested threads: 2
Nested thread number: 1
I think that omp_get_max_threads()
is okay in the context of the printing message based on this if we don't have any nested blocks, use omp_dynamic
or set explicitly the num_threads
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very nice! I think that should settle it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember using omp_get_num_threads
instead of omp_get_max_threads
specifically for hybrid MPI + OpenMP codes. Something like : if I have two tasks on a node, max_threads
will give all core count for each MPI ranks and omp_get_thread_num
under parallel region would behave differently.
BUT by doing quick tests on BB5, I think I have false memory. I see the behaviour is intended with the omp_get_max_threads
as well.
So considering the simple use case we have (neuron/coreneuron will set omp threads by call omp_set_num_threads
), I would go ahead with what Ioannis did. (Will come back here if I remember why I thought otherwise).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for future reference, I added MPI
in my code
#include <iostream>
#include <omp.h>
#include <mpi.h>
#include <unistd.h>
#define THREAD_NUM 4
int main()
{
// Initialize the MPI environment
MPI_Init(NULL, NULL);
// Get the number of processes
int world_size;
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
// Get the rank of the process
int world_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
// Get the name of the processor
char processor_name[MPI_MAX_PROCESSOR_NAME];
int name_len;
MPI_Get_processor_name(processor_name, &name_len);
// Print off a hello world message
printf("Hello world from processor %s, rank %d out of %d processors\n",
processor_name, world_rank, world_size);
if (!getenv("OMP_NUM_THREADS")) {
omp_set_num_threads(THREAD_NUM); // set number of threads in "parallel" blocks
}
std::cout << "Number of max available threads: " << omp_get_max_threads() << std::endl;
#pragma omp parallel //num_threads(2)
{
usleep(5000 * omp_get_thread_num()); // do this to avoid race condition while printing
std::cout << "Number of available threads: " << omp_get_num_threads() << std::endl;
// each thread can also get its own number
std::cout << "Current thread number: " << omp_get_thread_num() << std::endl;
std::cout << "Hello, World!" << std::endl;
#pragma omp parallel //num_threads(2)
{
usleep(1000 * omp_get_thread_num()); // do this to avoid race condition while printing
std::cout << "Number of available nested threads: " << omp_get_num_threads() << std::endl;
std::cout << "Nested thread number: " << omp_get_thread_num() << std::endl;
}
}
// Finalize the MPI environment.
MPI_Finalize();
return 0;
}
and got the following:
-bash-4.2$ export OMP_NUM_THREADS=3
-bash-4.2$ mpicxx openmp.cpp -qopenmp -o openmp
-bash-4.2$ srun -N 1 --ntasks-per-node=36 -p interactive --account=proj16 --constraint="cpu|nvme" -t 00:02:00 -n 2 openmp
srun: job 745676 queued and waiting for resources
srun: job 745676 has been allocated resources
Hello world from processor r1i5n35, rank 0 out of 2 processors
Hello world from processor r1i5n35, rank 1 out of 2 processors
Number of max available threads: 3
Number of available threads: 3
Current thread number: 0
Hello, World!
Number of max available threads: 3
Number of available threads: 3
Current thread number: 0
Hello, World!
Number of available nested threads: 1
Nested thread number: 0
Number of available nested threads: 1
Nested thread number: 0
Number of available threads: 3
Current thread number: 1
Hello, World!
Number of available threads: 3
Current thread number: 1
Hello, World!
Number of available nested threads: 1
Nested thread number: 0
Number of available nested threads: 1
Nested thread number: 0
Number of available threads: 3
Current thread number: 2
Hello, World!
Number of available threads: 3
Current thread number: 2
Hello, World!
Number of available nested threads: 1
Nested thread number: 0
Number of available nested threads: 1
Nested thread number: 0
- Replace omp_get_num_threads() with opm_get_max_threads() to get the max number of threads that can OpenMP use CoreNEURON Repo SHA: BlueBrain/CoreNeuron@9b1d96b
omp_get_num_threads()
withomp_get_max_threads()
to get themax number of threads that OpenMP can use
omp_get_num_threads()
run in a "serial" region and always printed1