Skip to content

Commit

Permalink
Add load balancing control (#198)
Browse files Browse the repository at this point in the history
* Redo load balancing (only).

* Update submodules.

* Describe additional derived vars.

* Add a brief description of the code base.

* A few fixes.

* Add control doc.

* Trailing whitespaces.

* Use LMeX version of the diagnostics only if requested.

* Add a couple of details to address Bruce's comments.
  • Loading branch information
esclapez authored Apr 28, 2023
1 parent e7e1fcc commit e5864ff
Show file tree
Hide file tree
Showing 19 changed files with 1,027 additions and 50 deletions.
1 change: 1 addition & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ option(PELELMEX_USE_EB "Enable Embedded Boundary" OFF)
option(PELELMEX_USE_EFIELD "Enable Electric field module" OFF)
option(PELELMEX_USE_PARTICLES "Enable Spray module" OFF)
option(PELELMEX_USE_SOOT "Enable Soot module" OFF)
option(PELELMEX_USE_DIAG "Enable LMeX Diagn module" OFF)

#
# Misc options
Expand Down
197 changes: 197 additions & 0 deletions Docs/source/manual/Implementation.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,197 @@
.. highlight:: rst

.. _sec:code:

Source code
===========

The following provides an overview of *PeleLMeX* source code, basic information on the data structure and is
useful for any user intending to use and/or do some development in the code.

Overview of source code
-----------------------

*PeleLMeX* is based upon AMReX's `AmrCore <https://amrex-codes.github.io/amrex/docs_html/AmrCore.html>`_ from which it inherits
an AMR hierarchy data structure and basic regridding functionnalities. The code is entirely written in C++, with low level
compute-intensive kernels implemented as lambda functions to seamlessly run on CPU and various GPU backends through AMReX
high performance portatbility abstraction.

The core of the algorithm is implementation in the ``advance()`` function which acts on all the levels concurrently.
Projection operators and advection scheme functions are imported the `AMReX-Hydro library <https://amrex-codes.github.io/AMReX-Hydro>`_
while the core of the thermo-chemistry functionalities comes from `PelePhysics <https://amrex-combustion.github.io/PelePhysics/>`_ .
Users are responsible for providing initial and boundary conditions in the local subfolder implementing their case, i.e. it is
not possible to compile and run *PeleLMeX* without actually writting a few lines of codes. However, numerous example are provided
in ``Exec/RegTests`` from which new users can pull for their new case.

The source code contains a few dozen files, organized around the pieces of the algorithm and major functionalities:

* ``PeleLMEvolve``: top level time advance loop, with IO/exit controls
* ``PeleLMSetup``: setting up the simulation parameters, parsing the input file
* ``PeleLMInit``: generating the initial solution from scratch or checkpoint file, performing initial projections/iteration(s)
* ``PeleLMAdvance``: top level implementation of the time step algorithm
* ``PeleLMProjection``: implement the various flavors of the nodal projection
* ``PeleLMUmac``: implement the construction and projection of MAC-velocities
* ``PeleLMAdvection``: functions to compute the explicit advection terms
* ``PeleLMDiffusion``: functions to compute the diffusion terms, using the operators defined in ``DiffusionOp``
* ``PeleLMReaction``: function using *PelePhysics* reactors to integrate the chemistry and linearized advection/diffusion
* ``PeleLMPlot``: implementation of plotfile and checkpoint file IOs
* ``PeleLMBC``: functions filling the ghost cells (at fine/fine, coarse/fine and domain boundaries)
* ``PeleLMRegrid``: creating new AMR level or remaking modified AMR level during adaptive refinement
* ``PeleLMTagging``: mark cells for refinement
* ``PeleLM_K.H``: low-level kernel functions

Data structure and containers
-----------------------------

AMReX 101
^^^^^^^^^

The basic AMReX`s data structure is the `MultiFab <https://amrex-codes.github.io/amrex/docs_html/Basics.html#fabarray-multifab-and-imultifab>`_
(historically, multi Fortran Array Box (FAB)).
Within the block-structured AMR approach of AMReX, the domain is decomposed into non-overlapping rectangular `boxes`,
which can be assembled into a `boxArray`. Each AMR level has a `boxArray` providing the list of `boxes` of that level.
The `boxes` are distributed accross the MPI ranks, the mapping of which is described by a `DistributionMap`. Given a
`boxArray` and a `DistributionMap`, one can define an actual data container (`boxes` are only lightweight descriptor
of the geometrical rectangular object, containing bounds and centering information only), where each rank will
allocate a FAB for the boxes it owns in the `boxArray`, resulting in a collection of FABs or a MultiFab, distributed
accross the MPI ranks.

To access the data in a MultiFab, one uses a `MFIter <https://amrex-codes.github.io/amrex/docs_html/Basics.html#mfiter-and-tiling>`_
(or MultiFab iterator), which provides each MPI rank access to the FABs it owns within the MultiFab. Actual access to the data in
memory is then provided by the lightweight `Array4` structure and it is strongly advised to rely on AMReX
`ParallelFor <https://amrex-codes.github.io/amrex/docs_html/Basics.html#parallelfor>`_ function template to loop through the logical `i,j,k` indexes.
For example, to set the velocity data stored in a MultiFab called `NewState` with data from an `OldState` and an increment
from a third MultiFab `advTerm`: ::

for (MFIter mfi(State,TilingIfNotGPU()); mfi.isValid(); ++mfi) {
Box const& bx = mfi.tilebox();
auto const& velo_old = OldState.const_array(mfi,VELOCITY_INDEX);
auto const& incr = advTerm.const_array(mfi);
auto const& velo_new = NewState.array(mfi,VELOCITY_INDEX);
ParallelFor(bx,
[=] AMREX_GPU_DEVICE (int i, int j, int k) noexcept
{
velo_new(i,j,k) = velo_old(i,j,k) + incr(i,j,k);
});
}

.. note::
For the example above to function, all three MultiFabs must have the same `boxArray` and `DistributionMap`

Users are strongly encouraged to review of the content of `AMReX documentation <https://amrex-codes.github.io/amrex/docs_html/Basics.html>`_
to get more familiar with AMReX data structures and environment.

*PeleLMeX* state and advance containers
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The state vector of *PeleLMeX* contains the 2 or 3 components of velocity, the mixture density, species density (rhoYs),
rhoH, temperature and the thermodynamic pressure. The state components are stored in a cell-centered MultiFab with
`NVAR` components. Additionnally, the perturbational pressure stored at the nodes is contained in a separate MultiFab.
Together with the cell-centered pressure gradient, the cell-centered divergence constraint and cell-centered
transport properties, these MultiFabs are assembled into a `LevelData` struct.

Each level in the AMR hierarchy have two versions of the `LevelData` at any point during the simulation: one
for the old state and one for the new state. The developer can get a pointer to the `LevelData` struct by
calling : ::

auto ldata_p = getLevelDataPtr(lev,AmrOldTime);

with either `AmrOldTime` or `AmrNewTime` on level `lev`. Additionnally, calling this function with
`AmrHalfTime` with return a `LevelData` struct whose `state` is a linearly interpolated between the old and new
states (but the other MultiFab in `LevelData` are empty !).
It is also often useful to have access to a vector of a state component accross the entire AMR hierarchy. To do so, *PeleLMeX*
provides a set of functions returning a vector of MultiFab `std::unique_ptr` aliased into the `LevelData`
MultiFab on each level: ::

getStateVect(time); # Return the entire state (ncomp: NVAR)
getVelocityVect(time); # Return the velocity only (ncomp: AMREX_SPACEDIM)
getDensityVect(time); # Return the mixture density (ncomp: 1)
getSpeciesVect(time); # Return the species density (ncomp: NUM_SPECIES)
getRhoHVect(time); # Return rhoH (ncomp: 1)
getTempVect(time); # Return temperature (ncomp: 1)
getDivUVect(time); # Return divergence constraint (ncomp: 1)
getDiffusivityVect(time); # Return diffusivity (ncomp: NUM_SPECIES+2)
getViscosityVect(time); # Return viscosity (ncomp: 1)

where ``time`` can either be `AmrOldTime` or `AmrNewTime`.
Also available at any point during the simulation is the `LevelDataReact` which contains the species
chemical source terms. A single version of the container is avaible on each level and can be accessed
using: ::

auto ldataR_p = getLevelDataReactPtr(lev);

Within the time-advance function, the *PeleLMeX* algorithm calls for the computation of the advection,
diffusion and reaction source terms iteratively using SDC. At each step, the results of other steps
can be used as part of the numerical scheme (e.g. the explicit advection with a Godunov scheme uses
the diffusion term). These temporary variables, only useful in the scope of the advance function, are
assembled into two structs: ``AdvanceDiffData`` and ``AdvanceAdvData``. The former contains three
MultiFabs for the separate diffusion term evaluations described in :numref:`LMeX_Algo`: :math:`D^n`,
:math:`D^{n+1,k}` and :math:`D^{n+1,k+1}`, as well as additional containers for the :math:`\overline{W}`
and Soret contributions. The later encapsulate the face-centered MAC velocities :math:`U_{ADV}`, the
advection term :math:`A_{n+1/2,(k+1)}`, the pressure correction :math:`\chi` and a forcing container
used in the RHS of advection/diffusion/reaction solves. In contrast with the `LevelData`, these two containers
are freed at the end of the advance function, and are passed around in the functions called in `advance()`.

Parallelism
-----------

*PeleLMeX* inherits the MPI+X approach from the AMReX library, where X can be any of CUDA, HIP or SYCL
(Open-MP is currently only available for non-reacting/explicit chemistry integrators) for heterogeneous architectures.
The reader is referred to `AMReX GPU documentation <https://amrex-codes.github.io/amrex/docs_html/GPU.html>`_ for more details on
the thread parallelism.

As mentioned above, the top-level spatial decomposition arises from AMReX's block-structured approach. On each level, non-overlapping
`boxes` are assembled into `boxArray` and distributed accross MPI rank with `DistributionMap` (or `DMap`).
It is in our best interest to ensure that all the MultiFab in the code use the same `boxArray` and `DMap`,
such that operation using `MFIter` can be performed and data copy accross MPI ranks is minimized.
However, it is also important to maintain a good load balancing, i.e. ensure that each MPI rank has the same amount
of work, to avoid wasting computational ressource. Reactive flow simulation are challenging, because the chemistry
integration is very spatially heterogeneous, with stiff ODE integration required within the flame front and non-stiff
integration of the linearized advection/diffusion required in the cold gases or burnt mixture. Additionnally, because
a non-subcycling approach is used in *PeleLMeX*, the chemistry doesn't have to be integrated in fine-covered region.
Two `boxArray` and associated `DMap` are thus available in *PeleLMeX*:

1. The first one is inherited from `AmrCore` and is availble as ``grid[lev]`` (`boxArray`) and ``dmap[lev]`` (`DMap`) throughout the code. Most
of *PeleLMeX* MultiFabs use these two, and the `boxes` sizes are dictated by the `amr.max_grid_size` and `amr.blocking_factor` from the input
file. These are employed for all the operations in the code except the chemistry integration. The default load balancing approach is to use
space curve filling (SCF) with each box weighted by the number of cells in each box. Advanced users can try alternate appraoch using the
keys listed in :doc:`LMeXControls`.
2. A second one is created, masking fine-covered regions and updated during regrid operations. It is used to perform the chemistry integration,
and because this is a purely local integration (in contrast with implicit diffusion solve for instance, which require communications
to solve the linear problem using GMG), a Knapsack load balancing approach is used by default, where the weight of each box is based
on the total number of chemistry RHS calls in the box. The size of the `boxes` in the chemistry `boxArray` (accessible with ``m_baChem[lev]``)
is controled by the `peleLM.max_grid_size_chem` in the input file. Once again, advanced users can try alternate approaches to load
balancing the chemistry `DMap` using the keys described in :doc:`LMeXControls`.

After each regrid operation, even if the grids did not actually change, *PeleLMeX* will try to find a better load balancing for the
`AmrCore` `DMap`. Because changing the load balancing requires copying data accross MPI ranks, we only want to change the `DMap`
only if a significantly better new `DMap` can be obtained, with the threshold for a better `DMap` defined based on the value of
`peleLM.load_balancing_efficiency_threshold`.

Debugging
---------

The first step to debug anyh addition or undefined behavior of *PeleLMeX* is to turn the ``DEBUG`` flag ``ON`` in the
GNUmakefile and activate AMReX`s floating point exception traps in the input file: ::

amrex.fpe_trap_invalid = 1
amrex.fpe_trap_zero = 1
amrex.fpe_trap_overflow = 1

This will slow down the code considerably, but will enable bound checks on all AMReX low-level data structure,
catch floating point errors (using nans, dividing by zero, ...) and any ``AMREX_ASSERT`` statement added to the
code base. It is also often useful to visualize data in order to understand the erroneous results the solver can
return. Developers can write to disk a single MultiFab using AMReX `VisMF`: ::

VisMF::Write(myMF,"VisMyMF");

and can be visualized with `Amrvis` using `amrvis -mf`. Alternatively, visualizing the entire AMR hierarchy is also
useful. *PeleLMeX* provides a simple function to write a vector of MultiFab: ::

WriteDebugPlotFile(GetVecOfConstPtrs(getTempVect()),"TempDebug");

which can be opened with `Amrvis` or any other visualization software. This last function will function providing that
the MultiFabs in the vector all have the same number of components.
Finally, another way of checking individual pieces of the algorithm is to use *PeleLMeX* evaluate mode ``peleLM.run_mode=evaluate``
and specify a list of fields with ``peleLM.evaluate_vars`` as described in :doc:`LMeXControls`. Note that not all of the
algorithm is available in this mode yet.
31 changes: 30 additions & 1 deletion Docs/source/manual/LMeXControls.rst
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,27 @@ Grid/AMR parameters
amr.blocking_factor = 16 # block factor in grid generation (min box size)
amr.max_grid_size = 64 # max box size

peleLM.max_grid_size_chem = 32 # [OPT, DEF="None"] Max box size for the Chemistry BoxArray

Load balancing
--------------

*PeleLMeX* relies on two distribution maps (see :doc:`Implementation` for more details).

::

peleLM.do_load_balancing = 1 # [OPT, DEF=0] Activate load balancing
peleLM.load_balancing_method = sfc # [OPT, DEF="sfc"] AmrCore dmap load balancing method
peleLM.load_balancing_cost_estimate = ncell # [OPT, DEF="ncell"] AmrCore dmap balancing cost
peleLM.chem_load_balancing_method = knapsack # [OPT, DEF="knapsack"] Chemistry dmap load balancing method
peleLM.chem_load_balancing_cost_estimate = chemfunctcall_sum # [OPT, DEF="chemfunctcall_sum"] Chemistry dmap balancing cost
peleLM.load_balancing_efficiency_threshold = 1.05 # What constitute a better dmap ?

The balancing method can be one of `sfc`, `roundrobin` or `knapsack`, while the cost estimate can be one of
`ncell`, `chemfunctcall_avg`, `chemfunctcall_max`, `chemfunctcall_sum`, `userdefined_avg` or `userdefined_sum`. When
using either of the last to option, the user must provide a definition for the `derUserDefined`. If multiple components
are defined in the `derUserDefined` function, the first one is used for load balancing.

Time stepping parameters
------------------------

Expand Down Expand Up @@ -125,7 +146,6 @@ overview of the available controls:
The `field_name` can be any of the state or derived variables (see below) component. Additional controls specific
to embedded boundaries are discussed below.


PeleLMeX derived variables
--------------------------

Expand Down Expand Up @@ -186,8 +206,17 @@ The following list of derived variables are available in PeleLMeX:
* - `coordinates`
- AMREX_SPACEDIM
- Cell-center coordinates
* - `DistributionMap`
- 1
- The MPI-rank of each box
* - `derUserDefined`
- ?
- A user-defined derived which number of components is provided by the user (see below).

Note that `mixture_fraction` and `progress_variable` requires additional inputs from the users as described below.
The `derUserDefined` allow the user to define its own derived variable which can comprise several components. To do
so, the user need to copy the Source/DeriveUserDefined.cpp file into his run folder and update the file. The number of
components is defined based on the size of the vector returned by pelelm_setuserderives().

PeleLMeX algorithm
------------------
Expand Down
9 changes: 6 additions & 3 deletions Docs/source/manual/Model.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ In a nutshell, `PeleLMeX` features include:
Mathematical background
-----------------------

`PeleLMeX` evolves chemically reacting low Mach number flows with block-structured adaptive mesh refinement (AMR). The code depends upon the `AMReX <https://github.com/AMReX-Codes/amrex>`_ library to provide the underlying data structures, and tools to manage and operate on them across massively parallel computing architectures. `PeleLMeX` also utilizes the source code and algorithmic infrastructure of `AMReX-Hydro <https://github.com/AMReX-Codes/AMReX-Hydro>`_. `PeleLMeX` borrows heavily from `PeleLM <https://github.com/AMReX-Combustion/PeleLM>`_. The core algorithms in `PeleLM` are described in the following papers:
`PeleLMeX` evolves chemically reacting low Mach number flows with block-structured adaptive mesh refinement (AMR). The code depends upon the `AMReX <https://github.com/AMReX-Codes/amrex>`_ library to provide the underlying data structures, and tools to manage and operate on them across massively parallel computing architectures. `PeleLMeX` also utilizes the source code and algorithmic infrastructure of `AMReX-Hydro <https://github.com/AMReX-Codes/AMReX-Hydro>`_. `PeleLMeX` borrows heavily from `PeleLM`_. The core algorithms in `PeleLM` are described in the following papers:

* *A conservative, thermodynamically consistent numerical approach for low Mach number combustion. I. Single-level integration*, A. Nonaka, J. B. Bell, and M. S. Day, *Combust. Theor. Model.*, **22** (1) 156-184 (2018)

Expand Down Expand Up @@ -172,12 +172,15 @@ and simplified velocity constraint,
PeleLMeX Algorithm
------------------

An overview of `PeleLMeX` time-advance function is provided in the figure below and details are provided in the following subsections.
An overview of `PeleLMeX` time-advance function is provided in :numref:`LMeX_Algo` and details are provided in the following subsections.

.. figure:: images/model/PeleLMeX_Algorithm.png
:name: LMeX_Algo
:align: center
:figwidth: 50%

: Flowchart of the *PeleLMeX* advance function.

The three steps of the low Mach number projection scheme described :ref:`below <ssec:projScheme>` are referenced to better
emphasize how the thermodynamic solve is closely weaved into the fractional step appraoch. Striped boxes indicate where the
:ref:`Godunov procedure <ssec:advScheme>` is employed while the four different linear solves are highlighted.
Expand Down Expand Up @@ -340,7 +343,7 @@ Note that in the presence of EB, only the `Godunov_PLM` variant is available.
AMR extension
^^^^^^^^^^^^^

In contrast with `PeleLM <https://github.com/AMReX-Combustion/PeleLM>`_, `PeleLMeX` do not rely a on subcycling appraoch to advance the AMR hierarchy.
In contrast with `PeleLM`_, `PeleLMeX` do not rely a on subcycling appraoch to advance the AMR hierarchy.
This difference is illustrated in the figure below comparing the multi-level time-stepping approach in both codes:

.. figure:: images/model/PeleLMeX_Subcycling.png
Expand Down
1 change: 1 addition & 0 deletions Docs/source/manual/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ point your web browser at the file ``${PELELMEX_HOME}/Docs/build/html/index.html
:caption: Theory:

Model.rst
Implementation.rst
Validation.rst
Performances.rst

Expand Down
4 changes: 3 additions & 1 deletion Source/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,9 @@ target_sources(pelelmex
Utils.cpp
)

add_subdirectory(Diagnostics)
if (PELELMEX_USE_DIAGS)
add_subdirectory(Diagnostics)
endif ()

if (PELELMEX_USE_EFIELD)
add_subdirectory(Efield)
Expand Down
Loading

0 comments on commit e5864ff

Please sign in to comment.