-
Notifications
You must be signed in to change notification settings - Fork 4
Swe performance exercise
Start by cloning a clean version of the SWE.git repository master branch
git clone --recursive https://github.com/fomics/SWE.git
... and apply a patch to the Sconstruct file
cd SWE
git cherry-pick 6624e61b9b82a8e6098bcf556d187a7fb9d7f492
module swap PrgEnv-cray PrgEnv-gnu
module load scons
module load python/2.7.2
Edit the SConstruct file to add the -fopenmp option necessary for the OpenMP compilation.
Before:
# OpenMP parallelism?
if env['compiler'] == 'intel' and env['openmp']:
env.Append(CCFLAGS=['-openmp'])
env.Append(LINKFLAGS=['-openmp'])
After:
# OpenMP parallelism?
if env['compiler'] == 'intel' and env['openmp']:
env.Append(CCFLAGS=['-openmp'])
env.Append(LINKFLAGS=['-openmp'])
if env['compiler'] == 'cray' and env['openmp']:
env.Append(CCFLAGS=['-fopenmp'])
env.Append(LINKFLAGS=['-fopenmp'])
Now compile the MPI/OpenMP hybrid version with:
scons copyenv=true compiler=cray parallelization=mpi solver=fwavevec openmp=yes
- Open the file src/blocks/SWE_WavePropagation.cpp
- Add the line
#define LOOP_OPENMP
before
- Add the line
#ifdef LOOP_OPENMP
#include <omp.h>
#endif
- Comment out the line that starts with
solver::Hybrid<float>
, which is only needed for the hybrid Riemann or Float solver.
cd src/tools
Edit the file help.hh. Comment out lines 99 and 100, initialization of float2d. In other words the resulting code should look like this:
Float2D(int _cols, int _rows) : rows(_rows),cols(_cols)
{
elem = new float[rows*cols];
// for (int i = 0; i < rows*cols; i++)
// elem[i] = 0;
}
Next, parallelize the initialization with OpenMP.
cd src/blocks
Edit the file SWE_Block.cpp. Make parallel for loops at line 97 and 107, namely for the scenario initialization for heights and bathymetry. The resulting code should look like this:
#pragma omp parallel for
// initialize water height and discharge
for(int i=1; i<=nx; i++)
for(int j=1; j<=ny; j++) {
float x = offsetX + (i-0.5f)*dx;
float y = offsetY + (j-0.5f)*dy;
h[i][j] = i_scenario.getWaterHeight(x,y);
hu[i][j] = i_scenario.getVeloc_u(x,y) * h[i][j];
hv[i][j] = i_scenario.getVeloc_v(x,y) * h[i][j];
};
// initialize bathymetry
#pragma omp parallel for
for(int i=0; i<=nx+1; i++) {
for(int j=0; j<=ny+1; j++) {
b[i][j] = i_scenario.getBathymetry( offsetX + (i-0.5f)*dx,
offsetY + (j-0.5f)*dy );
}
}
This is essentially the result of Michael Bader's component on SWE (without the vectorization aspects which were specific for the Intel compiler). Before anything else, try this on various numbers of MPI processes and OpenMP threads on one node to make sure that you see scalability. E.g., from SWE directory,
salloc -N 1 # get allocation of one node
OMP_NUM_THREADS=1 aprun -n 1 -d 1 build/SWE_cray_release_mpi_fwavevec -x 400 -y 400 -c 1 -o /dev/null
OMP_NUM_THREADS=16 aprun -n 1 -d 16 build/SWE_cray_release_mpi_fwavevec -x 400 -y 400 -c 1 -o /dev/null
OMP_NUM_THREADS=4 aprun -n 1 -d 4 build/SWE_cray_release_mpi_fwavevec -x 400 -y 400 -c 1 -o /dev/null
If time, try on more than one node on a much larger problem (say 16x), e.g.,
salloc -N 4 # get allocation of four node
OMP_NUM_THREADS=16 aprun -N 1 -n 4 -d 16 build/SWE_cray_release_mpi_fwavevec -x 1600 -y 1600 -c 1 -o /dev/null