This software benchmarks the performance of PnetCDF method implementing the I/O kernel of S3D combustion simulation code. The evaluation method is weak scaling.
S3D is a continuum scale first principles direct numerical simulation code which solves the compressible governing equations of mass continuity, momenta, energy and mass fractions of chemical species including chemical reactions. Readers are referred to the published paper below.
- J. Chen, A. Choudhary, B. de Supinski, M. DeVries, E. Hawkes, S. Klasky, W. Liao, K. Ma, J. Crummey, N. Podhorszki, R. Sankaran, S. Shende, and C. Yoo. Teras-cale Direct Numerical Simulations of Turbulent Combustion Using S3D. In Computational Science and Discovery Volume 2, January 2009.
A data checkpoint is performed at regular time intervals, and its data consist
of three- and four-dimensional array variables of type double
. At each
checkpoint, four global arrays, representing mass, velocity, pressure, and
temperature, respectively, are written to a newly created file in the canonical
order. Mass and velocity are four-dimensional arrays while pressure and
temperature are three-dimensional arrays. All four arrays share the same size
of the lowest three spatial dimensions X, Y, and Z, which are partitioned among
MPI processes in a block-block-block fashion. See Figure 1 below for an
illustration when the number of MPI processes is 64. For the mass and velocity
arrays, the length of the fourth dimension is 11 and 3, respectively. The
fourth dimension, the most significant one, is not partitioned. As the number
of MPI processes increases, the aggregate I/O amount proportionally increases
as well. For more detailed description, please refer to:
- W. Liao and A. Choudhary. Dynamically Adapting File Domain Partitioning Methods for Collective I/O Based on Underlying Parallel File System Locking Protocols. In the Proceedings of International Conference for High Performance Computing, Networking, Storage and Analysis, Austin, Texas, November 2008.
Figure 1. S3D I/O data partitioning pattern. (a) For 3D arrays, the sub-array of each process is mapped to the global array in a fashion of block partitioning in all X-Y-Z dimensions.(b) For 4D arrays, the lowest X-Y-Z dimensions are partitioned the same as the 3D arrays while the fourth dimension is not partitioned. This example uses 64 processes and highlights the mapping of process P41's sub-array to the global array.
Edit file Makefile
and adjust the following variables:
MPIF90 - MPI Fortran 90 compiler
FCFLAGS - compile flags
PNETCDF_DIR - the path of PnetCDF library (1.4.0 and higher is required)
For example:
MPIF90 = /usr/bin/mpif90
FCFLAGS = -O2
PNETCDF_DIR = ${HOME}/PnetCDF
The command-line arguments are shown below, which can also be obtain by command
./s3d_io.x -h
.
Usage: s3d_io.x nx_g ny_g nz_g npx npy npz method restart dir_path
There are 9 command-line arguments:
nx_g - GLOBAL grid size along X dimension
ny_g - GLOBAL grid size along Y dimension
nz_g - GLOBAL grid size along Z dimension
npx - number of MPI processes along X dimension
npy - number of MPI processes along Y dimension
npz - number of MPI processes along Z dimension
method - 0: using PnetCDF blocking APIs, 1: nonblocking APIs
restart - restart from reading a previous written file (True/False)
dir_path - the directory name to store the output files
To change the number of checkpoint dumps (default is set to 5), edit
file param_m.f90
and set a different value for i_time_end
:
i_time_end = 5 ! number of checkpoints (also number of output files)
When the I/O method is set to PnetCDF blocking APIs, each checkpoint has a
total of 4 x i_time_end
collective PnetCDF write calls, one for each
variable. If restart is set to True, the number of read collective calls is 4,
one per variable, regardless of the number of checkpoints, as read only
performs once at the beginning of the run. When the I/O method is set to
PnetCDF nonblocking APIs, each checkpoint has 4 nonblocking PnetCDF write
calls, one per variable, followed by a call to nfmpi_waitall
to flush the
write requests. Thus, the total number of calls to nfmpi_waitall is equal to
i_time_end. For read, there are a total of 4 nonblocking PnetCDF read calls and
a single call to nfmpi_waitall for read requests.
The contents of all variables written to netCDF files are randomly generated
numbers. This setting can be disabled by commenting out the line below in file
solve_driver.f90
. Commenting it out can reduce the benchmark execution time.
call random_set
For a test run with small data size and a short return time, here is an example command for running on 4 MPI processes.
mpiexec -n 4 ./s3d_io.x 10 10 10 2 2 1 1 F .
The example command below runs a job on 4096 MPI processes with the global
array of size 800 x 800 x 800
and local arrays of size 50 x 50 x 50
, output
directory /scratch1/scratchdirs/wkliao/FS_1M_96
using nonblocking APIs, and
without restart.
mpiexec -l -n 4096 ./s3d_io.x 800 800 800 16 16 16 1 F /scratch1/scratchdirs/wkliao/FS_1M_96
++++ I/O is done through PnetCDF ++++
I/O method : nonblocking APIs
Run with restart : False
No. MPI processes : 4096
Global array size : 800 x 800 x 800
output file path : /scratch1/scratchdirs/wkliao/FS_1M_96
file striping count : 96
file striping size : 1048576 bytes
-----------------------------------------------
Time for open : 0.11 sec
Time for read : 0.00 sec
Time for write : 18.04 sec
Time for close : 0.02 sec
no. read calls : 0 per process
no. write calls : 20 per process
total read amount : 0.00 GiB
total write amount : 305.18 GiB
read bandwidth : 0.00 MiB/s
write bandwidth : 17318.78 MiB/s
-----------------------------------------------
total I/O amount : 305.18 GiB
total I/O time : 18.17 sec
I/O bandwidth : 17201.53 MiB/s
% ncdump -h pressure_wave_test.0.000E+00.field.nc
netcdf pressure_wave_test.0.000E+00.field {
dimensions:
x = 800 ;
y = 800 ;
z = 800 ;
nsc = 11 ;
three = 3 ;
variables:
double yspecies(nsc, z, y, x) ;
double u(three, z, y, x) ;
double pressure(z, y, x) ;
double temp(z, y, x) ;
// global attributes:
:time = 0. ;
:tstep = 0. ;
:time_save = 100000. ;
}
email: [email protected]
Copyright (C) 2013, Northwestern University
See COPYRIGHT notice in top-level directory.