discretization.tex

% !TEX root = main.tex

This section describes Trilinos packages that provide tools for spatial and temporal discretization of integro-differential equations. Most discretization efforts in Trilinos have been devoted to implementing tools for mesh-based discretizations of partial differential equations (PDEs) with a focus on high-order finite elements. A notable exception is the research package Compadre (Section~\ref{sec:compadre}), which provides tools for meshless approximation of linear operators that can be used for the discretization of differential equations and for data transfer.
Trilinos' discretization packages have been adopted by many applications addressing a wide range of physics problems, including solid mechanics, earth system modeling, semiconductor devices modeling, and electro-magnetics. These applications have taken different approaches in adopting Trilinos mesh-based discretization tools. The less intrusive approach is the adoption of Intrepid2 tools to perform local finite element assembly. In this case, the application has to manage the global assembly, possibly using the \code{DoFManager} provided by Panzer, and FECrs matrix and vector structures provided by Tpetra.
A more intrusive approach is to additionally use the Phalanx package for managing dependencies of field evaluations in conjunction with the Thyra Model Evaluator and Sacado automatic differentiation, and possibly Tempus for time integration. This approach is particularly useful when developing solvers for complex multiphysics problems, because it allows easy re-use of computational kernels and automates the computation of Jacobians and sensitivities.
The most intrusive approach is to build the application around the Panzer package, which provides all of the above, plus the handling of linear and nonlinear solvers and integrated constrained optimization capabilities.
In the following, we neither include a description of the snapshotted packages Sierra ToolKit (STK) and Krino, which provide mesh and level-set tools, nor a description of the Shards package, which provides tools for mesh cell topology.

\subsection{Local Assembly: Intrepid2}
Intrepid2 provides interoperable tools for compatible discretizations of PDEs; it is a performance\hyp{}portable re-implementation and extension of the legacy Intrepid package \cite{bochev2012}. Intrepid2 mainly focuses on local assembly of continuous and discontinuous finite elements. It also provides limited capabilities for finite volume discretization.  Intrepid2 works on batches of elements (cells), and provides tools to efficiently compute discretized linear functionals (e.g., right-hand-side vectors) and differential operators (e.g., stiffness matrices) at the element level. Intrepid2 implements compatible finite element spaces of various polynomial orders for $H({\rm grad})$, $H({\rm curl})$, $H({\rm div})$, and $L^2$ function spaces on triangles, quadrilaterals, tetrahedrons, hexahedrons, wedges, and pyramids. It provides both Lagrangian basis functions and hierarchical basis functions \cite{fuentes2015}, and it implements performance optimizations (e.g., sum factorizations) exploiting the underlying structure of the problem (e.g., tensor-product elements or other symmetries).  The degrees of freedom of $H({\rm div})$ and $H({\rm curl})$ finite elements as well as high-order $H({\rm grad})$ finite elements depend on the global orientation of edges and faces and Intrepid2 provides orientation tools for matching the degrees of freedom on shared edges and faces. It also provides interpolation-based projection tools for projecting functions in $H({\rm grad})$, $H({\rm curl})$, $H({\rm div})$, and $L^2$ to the respective discrete spaces. Intrepid2 implements these capabilities through the following classes:
\begin{itemize}
\item \code{CellTools}: This class provides geometric operations in the reference and physical frames. This includes the computation of tangents and normals to edges and faces in the physical frame, the computation of the Jacobian of the reference-to-physical frame map, and other geometric computations.
\item \code{CubatureFactory}: This class provides quadrature rules (called \emph{cubatures} in Intrepid2) of various degrees of accuracy for approximating integrals over elements and their boundaries.
\item \code{Basis:} This is the base class that provides a common interface for functionalities related to finite element bases. Intrepid2 provides derived classes that implement this common interface for a variety of compatible finite element spaces. Each derived class implements the \code{getValues()} method that computes the values taken by the basis functions or their derivatives (e.g., the gradient for $H({\rm grad})$ functions, the curl for $H({\rm curl})$ functions) at a set of input points. The implementation of \code{getValues()} can be very different depending on the basis. Specific optimizations are available for tensor-product elements.  Additionally, there is a \code{BasisFamily} class with a convenience method, \code{getBasis()}, which constructs a basis depending on a template argument specifying the type of basis (e.g., hierarchical or nodal), the cell topology, the function space on which it is defined, and its polynomial degree.
\item \code{OrientationTools}: This class provides methods to orient the basis functions based on the global orientation of edges and faces, determined by the global numbering of the cell vertices. This is achieved by building a linear operator (a permutation for tensor-product elements) that encodes the orientation of a particular cell and applying that operator to the reference basis functions.
\item \code{ProjectionTools}: This class provides methods for interpolation-based projections of a given function into a compatible finite element space or between compatible finite element spaces~\cite{demkowicz2007}. The provided projections commute with the corresponding differential operators if the quadrature rules can exactly integrate the functions being projected. As an example, projecting an $H({\rm grad})$ function into the $H({\rm grad})$ finite element space and then taking its gradient gives the same result as taking the gradient of the function first and then projecting the gradient into the $H({\rm curl})$ finite element space.
\item \code{FunctionSpaceTools}: This class provides transformations of fields from the reference to the physical frame and back, computation of measures on edges, faces, and cells, scalar/vector/tensor multiplications and contractions for computing integrals.
\item \code{IntegrationTools}: This class provides integration methods that can take advantage of tensor product structures in basis values, providing mechanisms for performance\hyp{}portable, \emph{sum-factorized} assembly across $H({\rm grad})$, $H({\rm curl})$, $H({\rm div})$, and $L^2$ function spaces. In the future, we plan to provide similar interfaces to support matrix-free discretizations.
\end{itemize}
Intrepid2 makes use of Kokkos containers to enable memory layouts that are adapted to the computational platform. Intrepid2 also uses Kokkos for its core computational kernels, enabling threaded execution across a variety of architectures. The data types used by Intrepid2 are templated; it is therefore possible to propagate Sacado types through Intrepid2 to perform automatic differentiation. Current development of Intrepid2 focuses on providing efficient matrix-free discretizations to enhance efficiency on GPU architectures.

\subsection{Local Field Evaluation: Phalanx}
Phalanx is a local field evaluation library designed for equation assembly in PDE applications. The goal of Phalanx is to decompose a complex problem into a number of simpler problems with managed dependencies to support rapid development and extensibility of PDE codes \cite{Notz2012,pawlowski2012automating,pawlowski2012automatingpart2}. The data structures use Kokkos for performance portability. Through the use of template metaprogramming, Phalanx supports arbitrary user defined data types and evaluation types. This feature allows for simple integration of automatic differentiation tools via operator-overloaded scalar types from the Sacado package (Section~\ref{sec:sacado}). From a simple definition of equations, quantities such as Jacobians, Jacobian-vector products, Hessians, and parameter sensitivities can be evaluated to machine precision. These quantities can be used by other Trilinos packages for operations including Newton-based nonlinear solves, gradient-based optimization, constraint enforcement, and bifurcation analysis \cite{pawlowski2012automating,pawlowski2012automatingpart2}.

Phalanx uses a graph-based design to manage data dependencies. The runtime-defined directed acyclic graph (DAG) allows for rapid prototyping in a production environment where simple interfaces for analysts, flexible models/data structures, and integration of non-trivial third-party libraries are paramount. Phalanx is used in a number of large-scale parallel codes including Albany \cite{Salinger2016}, Charon \cite{CharonUsersManual2020}, Drekar \cite{Crockatt2022,Miller2019,Shadid2016mhd}, and EMPIRE \cite{BettencourtBrownEtAl2021_EmpirePic}.

In recent years, Phalanx has been extended to provide utilities for performance portability under automatic differentiation. For example, Phalanx provides tools for building and managing a Kokkos view-of-views on device without the use of unified virtual memory (UVM) and provides utilities for running virtual functions on device. In the future, these utilities may be separated into a separate package.

\subsection{Time Integration: Tempus}
Tempus (Latin, meaning time as in ``tempus fugit'' $\rightarrow$ ``time flies'') is Trilinos' time-integration package for advanced transient
analysis. It includes various time integrators and embedded
sensitivity analysis for next-generation code architectures.  Tempus
provides ``out-of-the-box'' time-integration capabilities, which
allows users to quickly and easily incorporate time-integration
capabilities in their applications and switch between various time
integrators depending on the simulation needs.  Additionally, Tempus
provides ``build-your-own'' capabilities, which allows applications
to incorporate various Tempus components to augment or replace
an application's transient capabilities. Other capabilities include
embedded error analysis, sensitivity analysis, and transient optimization
with ROL.

Tempus offers a general infrastructure for the time evolution of
solutions through a variety of general integration schemes.  Tempus
provides time integrators for explicit and implicit methods and for
first- and second-order ODEs.  It can be used from small systems of
equations (e.g., single ODEs for the time evolution of plasticity
models and multiple ODEs for coupled chemical reactions) to
large-scale transient simulations requiring exascale computing
(e.g., flow fields around reentry vehicles and magneto-hydrodynamics).

Tempus has several components that can be used in concert or
individually, depending on the needs of the application.
\begin{itemize}
  \item Integrators are the time-loop structure for time integration
  and provide several features, e.g., controlling the advancement of
  the solution, selecting the next time step size and handling the
  solution output.

  \item Time steppers are individual methods that advance the
  solution from one step to the next.  A variety of time steppers
  are available:
  \begin{itemize}
    \item \emph{Classic one-step methods} (e.g., forward Euler and trapezoidal methods)
    \item \emph{Explicit Runge-Kutta (RK) methods} (e.g., RK explicit 4 Stage)
    \item \emph{Diagonally Implicit Runge-Kutta (DIRK) methods} (e.g.,
    general tableau DIRK and many specific DIRK/SDIRK methods)
    \item \emph{Implicit-Explicit (IMEX) Runge-Kutta methods} (e.g., IMEX
    RK SSP2, IMEX RK SSP3, and general tableau IMEX RK methods)
    \item \emph{Multi-Step methods} (i.e., BDF2)
    \item \emph{Second-order ODE methods} (e.g., Leapfrog, Newmark methods,
    and HHT-Alpha)
    \item \emph{Steppers with subSteppers} (e.g., operator-split and
    subcycling methods)
  \end{itemize}

  \item Solution history is used to maintain the solution during
  time step failure, for solution restart/output, for interpolation of
  the solution between time steps, and to provide the solution for
  transient adjoint sensitivities.

  \item Time step control and strategies provide methods to select
  the time step size based on user input and/or temporal error
  control (e.g., bounding min/max time-step size, relative/absolute
  maximum error, and time step adjustments for output and checkpointing).
\end{itemize}

Additionally, Tempus has several mechanisms which allow users to
insert application-specific algorithms into Tempus components (e.g.,
through observers and creating derived classes).

\subsection{Finite Element Analysis: Panzer}
The Panzer package provides global tools for finite element analysis. The package is targeted for large-scale parallel implicit PDEs using continuous and discontinuous compatible finite elements as well as control volume finite elements. Panzer enables the solution of nonlinear problems by interfacing with several Trilinos linear and nonlinear solvers. It uses the Intrepid2 package for arbitrary-order finite element bases. It computes derivatives and sensitivities with the Sacado automatic differentiation (AD) package. Panzer achieves performance portability through the Kokkos programming model via Tpetra data structures.

Panzer is designed for multiphysics systems.
The assembly engine allows for different equation sets in each element block of the mesh and allows for mixed bases for degrees of freedom (DOF) within each element block
(e.g., mixed $H(\rm{div})$ and $H(\rm{curl})$ system for electro-magnetics).
The assembly tools can create a single fully coupled Jacobian matrix with all off-block dependencies (as a single \code{Tpetra::CrsMatrix}),
or it can create a blocked system, grouping sets of DOFs into separate explicit matrices.
This allows for physics-based block preconditioning strategies using the Teko package in Trilinos~\cite{Bonilla2023,Cyr2016a}
or MueLu's fully coupled AMG hierarchies for multiphysics systems~\cite{Ohm2022a}.
The assembly process relies on the Phalanx package for efficient assembly of the multiphysics systems.
Panzer additionally can wrap the assembly in a \code{Thyra::ModelEvaluator}, providing direct interfaces to the linear and nonlinear analysis packages in Trilinos.

Panzer can provide low level utilities for application codes to build on, or can be used as a high-level application framework. Important capabilities include the following:
\begin{itemize}
\item \code{DOFManager}: Panzer provides a stand-alone DOF manager class in the dof-mgr subpackage. Given a list of DOFs and their corresponding basis and element blocks, the DOF manager can provide the mapping from DOFs on the mesh entities to the entries in linear algebra objects such as residual vectors and Jacobian matrices. The DOF manager can return the objects required to build distributed Tpetra
%and Epetra
maps and graphs for both uniquely owned global indices and ghosted indices used during assembly.
%\todo{Remove ``Epetra''?}
It provides the local indexing used during assembly as well. The DOF manager contains a mesh abstraction called a connection manager that provides information about mesh connectivity, global numbering of mesh topological entities, and element block groups. It is designed to support any underlying mesh database, allowing applications to use the DOF manager with any finite element application.
\item \code{STK\_Interface}: Panzer contains a concrete implementation of the connection manager API for the STK mesh database package in Trilinos. The \code{STK\_Interface} object wraps an STK mesh database. It provides a simple interface for accessing global indices of the mesh and can be used to read/write associated solution data to the database. It additionally supports SEACAS for writing mesh data to disk. The \code{STK\_Interface} also includes support for periodic boundary conditions.  This capability can match topological entities on periodic parallel distributed faces. Once matched, the DOFs are unified to enforce periodicity for the DOF manager. This capability is in the adapters-stk subpackage.
\item \emph{Linear Object Factory:} Panzer provides a linear object factory and linear object container designed to support parallel distributed assembly.
%It supports both Epetra and Tpetra objects.
The returned containers hold either the linear algebra objects for a uniquely owned DOF map used for solving the linear system or a ghosted version of the linear objects that can be used for assembly. The containers support export and import operations between the unique and ghosted containers. These are used to simplify the assembly process and abstract the underlying Tpetra linear algebra objects. %(i.e., Epetra or Tpetra).
The linear object factory can also create DOF gather and scatter functors for the corresponding (and possibly blocked) matrices. The gather and scatter operations are specialized on the assembly types, including residuals, Jacobians, and tangents.
Support for Hessians will be implemented as needed.
The capability is found in the disc-fe subpackage.

\item \emph{Worksets:} The Panzer assembly process provides workset containers to hold static data that can be reused in finite element computations. For example, when using a static mesh, these containers could hold the Jacobian, Jacobian inverse, and all finite element basis values at the quadrature points. The workset concept breaks the local elements on an MPI process into smaller blocks of uniform computations that can be dispatched to a CPU or GPU. The workset helps to control total memory use for temporaries in the Phalanx DAG. This capability is part of the disc-fe subpackage.
\item \emph{Assembly Kernels:} Panzer provides a number of assembly kernels that are optimized for use with Sacado AD data types. When assembling on GPUs, the assembly kernels were found to be an order of magnitude faster if the AD evaluation was parallelized over the internal derivative dimension \cite{phipps2022automatic}. This required using kernels with a \code{Kokkos::TeamPolicy} for finite element assembly, where the lowest level of parallelism is used internally by the Sacado AD type. When building Panzer for GPUs, the \code{DFad} hierarchic parallelism flag in Sacado should be enabled for kernel performance. The kernels can be found in the disc-fe subpackage.
\item Panzer provides an example miniapp for implicit electro-magnetics that is used for benchmarking linear solver performance and for acceptance testing of new high-performance computing systems. It demonstrates an $H(\rm{curl})$-$H(\rm{div})$ formulation for the electric and magnetic fields. It can be found in the MiniEM subpackage.
\end{itemize}

The Panzer package is intended to provide both low- and high-level tools for implicit finite elements discretizations. The high-level tools aggregate many Trilinos discretization and solver packages. While the high-level tools can be used as a rapid prototyping environment, the front-end is fairly complex to set up, as opposed to true rapid prototyping frameworks such as deal.ii~\cite{dealII95}, FEniCSx~\cite{BarattaEtal2023}, or Firedrake~\cite{FiredrakeUserManual}. Panzer is not user friendly in this regard, however, the examples and miniapps are a good starting point and can be quickly adapted to other physics applications. A number of performance portable applications are using Panzer tools at different levels of adoption; these include Albany \cite{Salinger2016}, Charon \cite{CharonUsersManual2020}, Drekar \cite{Crockatt2022,Miller2019,Shadid2016mhd}, and EMPIRE \cite{BettencourtBrownEtAl2021_EmpirePic}.

\subsection{Approximation of Linear Operators: Compadre}\
\label{sec:compadre}

The Compadre package provides tools for the approximation of linear operators (including point evaluation and derivatives), given the location of samples of a function over an unstructured cloud of points. The resulting stencils, when applied directly to samples of the function at these locations, provide an approximation of the linear operator acting on the function at the point(s) queried. This is useful for meshed and meshless data transfer applications (remap). Samples of the function at the specified locations can also be viewed as unknowns, in which case the solution returned by Compadre can be used as a stencil for meshless discretization of PDEs \cite{REBAR2024}.

The package uses generalized moving least squares (GMLS) for approximating functionals. GMLS generalizes classic moving least squares (MLS) in the sense that selecting point evaluations for the sampling functionals and the target operator in GMLS provides a traditional moving least squares reconstruction. However, GMLS offers significantly greater flexibility than MLS by allowing the evaluation of the target function at points to be replaced with \emph{sampling} functionals of a function (e.g., integrals of the function over local domains), and instead of approximating the target function, to approximate a target \emph{operator} (e.g., gradient, curl, divergence) of a function. A detailed description of GMLS can be found in \cite{mirzaei2012generalized,wendland2004scattered}.

This increased flexibility enables more advanced and unconventional remapping approaches. For instance, it is possible to use an average vector normal integral over edges (Raviart-Thomas type representation) or a cell-average integral (finite-volume representation) as the sampling functionals or the target operator \cite{gmd-15-6601-2022}. Compadre supports full space reconstruction in 1D, 2D, and 3D, and it also supports select sampling functionals and target operators on manifolds \cite{GROSS2020109340}.

Compadre's stencil generation involves independent problems to be solved in parallel at the team level with loops over the thread and vector level within each problem. This hierarchical parallelism is achieved with performance portability by using the Kokkos programming model and leveraging the batched QR with pivoting algorithm implemented in Kokkos Kernels.

Future plans include the implementation of other meshless methods like locally-supported radial basis functions.