Skip to content
This repository has been archived by the owner on Mar 20, 2023. It is now read-only.

Fix printing of max threads used by OpenMP #293

Merged
merged 1 commit into from
Apr 22, 2020
Merged

Conversation

iomaganaris
Copy link
Contributor

@iomaganaris iomaganaris commented Apr 21, 2020

- Replace omp_get_num_threads() with opm_get_max_threads() to get the
  max number of threads that can OpenMP use
@@ -125,7 +125,7 @@ void nrnmpi_init(int nrnmpi_under_nrncontrol, int* pargc, char*** pargv) {
if (nrnmpi_myid == 0) {
#if defined(_OPENMP)
printf(" num_mpi=%d\n num_omp_thread=%d\n\n", nrnmpi_numprocs_world,
omp_get_num_threads());
omp_get_max_threads());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iomaganaris : see here.
We do want omp_get_num_threads but it's needs to be called from parallel region and that's the issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually also always using omp_get_num_threads. But looking around, I don't see why @iomaganaris change is not actually a good one. This runtime call gives you the maximum available threads to OpenMP in the application, which is what you want to print here.
https://www.openmp.org/spec-html/5.0/openmpsu112.html
omp_get_num_threads would tell you how many threads there are actually in a particular parallel region. This would return the wrong value in nested parallelism and depends also on omp_set_num_threads.

Copy link
Contributor Author

@iomaganaris iomaganaris Apr 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried running the following test:

#include <iostream>
#include <omp.h>
#include <unistd.h>
#define THREAD_NUM 4
int main()
{
    if (!getenv("OMP_NUM_THREADS")) {
        omp_set_num_threads(THREAD_NUM); // set number of threads in "parallel" blocks
    }
    std::cout << "Number of max available threads: " << omp_get_max_threads() << std::endl;
    #pragma omp parallel //num_threads(2)
    {
        usleep(5000 * omp_get_thread_num()); // do this to avoid race condition while printing
        std::cout << "Number of available threads: " << omp_get_num_threads() << std::endl;
        // each thread can also get its own number
        std::cout << "Current thread number: " << omp_get_thread_num() << std::endl;
        std::cout << "Hello, World!" << std::endl;
        #pragma omp parallel //num_threads(2)
        {
            usleep(1000 * omp_get_thread_num()); // do this to avoid race condition while printing
            std::cout << "Number of available nested threads: " << omp_get_num_threads() << std::endl;
            std::cout << "Nested thread number: " << omp_get_thread_num() << std::endl;
        }
    }
    return 0;
}

And I got the following output:

-bash-4.2$ export OMP_NUM_THREADS=3
-bash-4.2$ ./openmp
Number of max available threads: 3
Number of available threads: 3
Current thread number: 0
Hello, World!
Number of available nested threads: 1
Nested thread number: 0
Number of available threads: 3
Current thread number: 1
Hello, World!
Number of available nested threads: 1
Nested thread number: 0
Number of available threads: 3
Current thread number: 2
Hello, World!
Number of available nested threads: 1
Nested thread number: 0

If I uncomment the num_threads(2) in the omp directives and set OMP_NESTED=TRUE:

-bash-4.2$ ./openmp
Number of max available threads: 3
Number of available threads: 2
Current thread number: 0
Hello, World!
Number of available nested threads: 2
Nested thread number: 0
Number of available nested threads: 2
Nested thread number: 1
Number of available threads: 2
Current thread number: 1
Hello, World!
Number of available nested threads: 2
Nested thread number: 0
Number of available nested threads: 2
Nested thread number: 1

I think that omp_get_max_threads() is okay in the context of the printing message based on this if we don't have any nested blocks, use omp_dynamic or set explicitly the num_threads

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice! I think that should settle it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember using omp_get_num_threads instead of omp_get_max_threads specifically for hybrid MPI + OpenMP codes. Something like : if I have two tasks on a node, max_threads will give all core count for each MPI ranks and omp_get_thread_num under parallel region would behave differently.

BUT by doing quick tests on BB5, I think I have false memory. I see the behaviour is intended with the omp_get_max_threads as well.

So considering the simple use case we have (neuron/coreneuron will set omp threads by call omp_set_num_threads), I would go ahead with what Ioannis did. (Will come back here if I remember why I thought otherwise).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for future reference, I added MPI in my code

#include <iostream>
#include <omp.h>
#include <mpi.h>
#include <unistd.h>
#define THREAD_NUM 4
int main()
{
    // Initialize the MPI environment
    MPI_Init(NULL, NULL);

    // Get the number of processes
    int world_size;
    MPI_Comm_size(MPI_COMM_WORLD, &world_size);

    // Get the rank of the process
    int world_rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);

    // Get the name of the processor
    char processor_name[MPI_MAX_PROCESSOR_NAME];
    int name_len;
    MPI_Get_processor_name(processor_name, &name_len);

    // Print off a hello world message
    printf("Hello world from processor %s, rank %d out of %d processors\n",
           processor_name, world_rank, world_size);

    if (!getenv("OMP_NUM_THREADS")) {
        omp_set_num_threads(THREAD_NUM); // set number of threads in "parallel" blocks
    }

    std::cout << "Number of max available threads: " << omp_get_max_threads() << std::endl;
    #pragma omp parallel //num_threads(2)
    {
        usleep(5000 * omp_get_thread_num()); // do this to avoid race condition while printing
        std::cout << "Number of available threads: " << omp_get_num_threads() << std::endl;
        // each thread can also get its own number
        std::cout << "Current thread number: " << omp_get_thread_num() << std::endl;
        std::cout << "Hello, World!" << std::endl;
        #pragma omp parallel //num_threads(2)
        {
            usleep(1000 * omp_get_thread_num()); // do this to avoid race condition while printing
            std::cout << "Number of available nested threads: " << omp_get_num_threads() << std::endl;
            std::cout << "Nested thread number: " << omp_get_thread_num() << std::endl;
        }
    }
    // Finalize the MPI environment.
    MPI_Finalize();
    return 0;
}

and got the following:

-bash-4.2$ export OMP_NUM_THREADS=3
-bash-4.2$ mpicxx openmp.cpp -qopenmp -o openmp
-bash-4.2$ srun -N 1 --ntasks-per-node=36 -p interactive --account=proj16 --constraint="cpu|nvme" -t 00:02:00 -n 2 openmp
srun: job 745676 queued and waiting for resources
srun: job 745676 has been allocated resources
Hello world from processor r1i5n35, rank 0 out of 2 processors
Hello world from processor r1i5n35, rank 1 out of 2 processors
Number of max available threads: 3
Number of available threads: 3
Current thread number: 0
Hello, World!
Number of max available threads: 3
Number of available threads: 3
Current thread number: 0
Hello, World!
Number of available nested threads: 1
Nested thread number: 0
Number of available nested threads: 1
Nested thread number: 0
Number of available threads: 3
Current thread number: 1
Hello, World!
Number of available threads: 3
Current thread number: 1
Hello, World!
Number of available nested threads: 1
Nested thread number: 0
Number of available nested threads: 1
Nested thread number: 0
Number of available threads: 3
Current thread number: 2
Hello, World!
Number of available threads: 3
Current thread number: 2
Hello, World!
Number of available nested threads: 1
Nested thread number: 0
Number of available nested threads: 1
Nested thread number: 0

@pramodk pramodk merged commit 9b1d96b into master Apr 22, 2020
@pramodk pramodk deleted the fix/omp_threads_print branch April 22, 2020 15:00
pramodk pushed a commit to neuronsimulator/nrn that referenced this pull request Nov 2, 2022
- Replace omp_get_num_threads() with opm_get_max_threads() to get the
  max number of threads that can OpenMP use

CoreNEURON Repo SHA: BlueBrain/CoreNeuron@9b1d96b
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants