-
Notifications
You must be signed in to change notification settings - Fork 19
/
discretization.tex
151 lines (123 loc) · 25.9 KB
/
discretization.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
This section describes Trilinos packages that provide tools for spatial and temporal discretization of integro-differential equations. Most of Trilinos discretization efforts have been devoted to implement tools for mesh-based discretizations of partial differential equations (PDEs) with a focus on high-order finite elements. A notable exception is constituted by the research package Compadre, which provides tools for meshless approximation of linear operators that can be used for the discretization of differential equations and for data transfer.
Trilinos discretization packages have been adopted by many applications addressing a wide range of physics problems, including solid mechanics, earth system modeling, semiconductor devices and electro-magnetics. These applications have taken different approaches in adopting Trilinos mesh-based discretization tools. The less intrusive approach is the adoption of Intrepid2 tools to perform local finite element assembly. The application has to manage the global assembly possibly using the DoFManager provided by Panzer and FE Crs matrix and vector structures provided by Tpetra.
A more intrusive approach is to additionally use Phalanx package for managing dependencies of field evaluations in conjunction the the Thyra Model Evaluator and Sacado algorithmic differentiations, and, possibly Tempus for time integration. This approach is particularly useful when developing complex multiphysics problems because it allows easy re-use of computational kernels and automates the computation of Jacobian and sensitivities.
The most intrusive approach is to build the application around the Panzer package, which provides all of the above, plus the handling of linear and nonlinear solvers and integrated constrained optimization capabilities.
In the following we describe some of the Trilinos discretization packages. For brevity, we do not include the description of these packages: Sierra ToolKit (STK), Krino and Percept, which provide mesh and level-set tools, and Shards, which provides tools for mesh topology. We refer to Trilinos website (\url{https://trilinos.github.io}) for a brief description of these packages.
\subsection{Intrepid2}
Intrepid2 provides interoperable tools for compatible discretizations of PDEs; it is a performance-portable re-implementation and extension of the legacy Intrepid package \cite{bochev2012}. Intrepid2 mainly focuses on local assembly of continuous and discontinuous finite elements. It also provides limited capabilities for finite volume discretization. Intrepid2 works on batches of elements (cells), and provides tools to efficiently compute discretized linear functionals (e.g., right-hand-side vectors) and differential operators (e.g., stiffness matrices) at the element level. Intrepid2 implements compatible finite element spaces of various polynomial orders for $H({\rm grad})$, $H({\rm curl})$, $H({\rm div})$ and $L^2$ function spaces on triangles, quadrilaterals, tetrahedrons, hexahedrons, wedges and pyramids. It provides both Lagrangian basis functions and hierarchical basis functions \cite{fuentes2015} and it implements performance optimizations (e.g., sum factorizations) exploiting the underlying structure of the problem (e.g., tensor-product elements or other symmetries). The degrees of freedom of $H({\rm div})$ and $H({\rm curl})$ finite elements, as well as high-order $H({\rm grad})$ finite elements, depend on the global orientation of edges and faces and Intrepid2 provides orientation tools for matching the degrees of freedom on shared edges and faces. It also provides interpolation-based projection tools for projecting functions in $H({\rm grad})$, $H({\rm curl})$, $H({\rm div})$ and $L^2$ to the respective discrete spaces. Intrepid2 implements these capabilities through these classes:
\begin{itemize}
\item \emph{CellTools:} This class provides geometric operations on the reference and physical frame. Includes computation of tangents and normals to edges/faces in the physical frame, computation of Jacobian of the reference-to-physical frame maps, and other metric computations.
\item \emph{CubatureFactory:} This class provides several quadrature rules (called \emph{cubatures} in Intrepid2) of various degrees of accuracy for approximating integrals over the elements and their boundaries.
\item \emph{Basis:} This is the base class for a variety of basis functions for compatible finite element spaces. Each class includes a \texttt{getValues()} method that computes the value of the basis functions or their derivatives (e.g., gradient for $H({\rm grad})$ functions, curl for $H({\rm curl})$ functions) at a set of input points. The implementation of \texttt{getValues()} can be very different depending on the basis. Specific optimizations are available for tensor-product elements. Additionally, there is a \texttt{BasisFamily} class with a convenience method, \texttt{getBasis()}, which constructs a basis depending on a template argument specifying the type of basis (hierarchical or nodal, e.g.), the cell topology and function space on which it is defined, and its polynomial degree.
\item \emph{OrientationTools:} This class provides methods to orient the basis functions based on the global orientation of edges and faces, determined by the global numbering of the cell vertices. This is achieved by building a linear operator (a permutation for tensor-product elements) that encodes the orientation of a particular cell, and applying that operator to the reference basis functions.
\item \emph{ProjectionTools:} This class provides methods for interpolation-based projections of a given function into a compatible finite element space or between compatible finite element spaces \cite{demkowicz2007}. The provided projections commute with the corresponding differential operators if the quadrature rules can exactly integrate the functions being projected. As an example, projecting an $H({\rm grad})$ function into the $H({\rm grad})$ finite-element space and then taking its gradient gives the same result as taking the gradient of the function first, and then projecting the gradient into the $H({\rm curl})$ finite-element space.
\item \emph{FunctionSpaceTools:} This class provides transformations of fields from reference to physical frame and back, computation of measures on edges, faces and cells, scalar/vector/tensor multiplications and contractions for computing integrals.
\item \emph{IntegrationTools:} This class provides integration methods that can take advantage of tensor product structures in basis values, providing mechanisms for performance-portable, \emph{sum-factorized} assembly across $H({\rm grad})$, $H({\rm curl})$, $H({\rm div})$ and $L^2$ function spaces. In the future, we plan to provide similar interfaces to support matrix-free discretizations.
\end{itemize}
Intrepid2 makes use of Kokkos containers to enable memory layouts that are adapted to the computational platform. Intrepid2 also uses Kokkos for its core computational kernels, enabling threaded execution across a variety of architectures. The data types used by Intrepid2 are templated; it is therefore possible to propagate Sacado types through Intrepid2 to perform automatic differentiation. Current development of Intrepid2 focuses on providing efficient matrix-free discretizations to enhance efficiency on GPU architectures.
\subsection{Phalanx}
Phalanx is a local field evaluation library designed for equation assembly in partial differential equation (PDE) applications. The goal of Phalanx is to decompose a complex problem into a number of simpler problems with managed dependencies to support rapid development and extensibility of PDE codes \cite{Notz2012,pawlowski2012automating,pawlowski2012automatingpart2}. The data structures use Kokkos \cite{trott2021kokkos} for performance portability. Through the use of template metaprogramming, Phalanx supports arbitrary user defined data types and evaluation types. This feature allows for simple integration of automatic differentiation tools via operator overloaded scalar types from the Sacado library \cite{phipps2022automatic}. From a simple equation definition, quantities such as Jacobians, Jacobian-vector products, Hessians and parameter sensitivities can be evaluated to machine precision. These quantities can be used by other Trilinos packages for operations including Newton-based nonlinear solves, gradient-based optimization, constraint enforcement and bifurcation analysis \cite{pawlowski2012automating,pawlowski2012automatingpart2}.
Phalanx uses a graph-based design to manage data dependencies. The runtime defined directed acyclic graph allows for rapid prototyping in a production environment where simple interfaces for analysts, flexible models/data structures and integration of non-trivial Third-party libraries are paramount. Phalanx is used in a number of large scale parallel codes including Albany \cite{Salinger2016}, Charon \cite{CharonUsersManual2020}, Drekar \cite{Crockatt2022,Miller2019,Shadid2016mhd} and EMPIRE \cite{BettencourtBrownEtAl2021_EmpirePic}.
In recent years, Phalanx has been extended to provide utilities for performance portability under automatic differentiation. For example, Phalanx provides tools for building and managing a Kokkos view-of-views on device without the use of unified virtual memory and provides utilities for running virtual functions on device. In the future, these utilities may be split out into a separate package.
\subsection{Tempus}
Tempus (Latin meaning time as in “tempus fugit” $\rightarrow$ “time flies”)
is the Trilinos time-integration package for advanced transient
analysis. It includes various time integrators and embedded
sensitivity analysis for next-generation code architectures. Tempus
provides “out-of-the-box” time-integration capabilities, which
allows users to quickly and easily incorporate time-integration
capabilities to their applications and switch between various time
integrators depending on the simulation needs. Additionally, Tempus
provides “build-your-own” capabilities, which allows applications
to incorporate various Tempus components to augment or replace
application transient capabilities. Other capabilities include
embedded error analysis, sensitivity analysis, transient optimization
with ROL.
Tempus provides a general infrastructure for the time evolution of
solutions through a variety of general integration schemes. Tempus
provides time integrators for explicit and implicit methods and for
first- and second-order ODEs. It can be used from small systems of
equations (e.g., single ODEs for the time evolution of plasticity
models, and multiple ODEs for coupled chemical reactions) to
large-scale transient simulations requiring exascale computing
(e.g., flow fields around reentry vehicles and magneto-hydrodynamics).
Tempus has several components that can be used in concert or
individually, depending on the needs of the application.
\begin{itemize}
\item Integrators are the time-loop structure for time integration
and provide several features, e.g., control the advancement of
the solution, selection of the next timestep size and solution
output.
\item Time Steppers are individual methods that advance the
solution from one step to the next. A variety of time steppers
are available:
\begin{itemize}
\item Classic one-step methods (e.g., Forward Euler and Trapezoidal
Method)
\item Explicit Runge-Kutta methods (e.g., RK Explicit 4 Stage)
\item Diagonally Implicit Runge-Kutta (DIRK) Methods (e.g.,
general tableau DIRK and many specific DIRK/SDIRK methods)
\item Implicit-Explicit (IMEX) Runge-Kutta Methods (e.g., IMEX
RK SSP2, IMEX RK SSP3, and general tableau IMEX RK methods)
\item Multi-Step Methods (i.e., BDF2)
\item Second-order ODE Methods (e.g., Leapfrog, Newmark methods
and HHT-Alpha)
\item Steppers with subSteppers (e.g., operator-split and
subcycling methods)
\end{itemize}
\item Solution History is used to maintain the solution during
time-step failure, solution restart/output, interpolation of
solution between time steps, and to provide the solution for
transient adjoint sensitivities.
\item Timestep Control and Strategies provide methods to select
the time-step size based on user input and/or temporal error
control (e.g., bounding min/max time-step size, relative/absolute
maximum error, and timestep adjustments for output and checkpointing)
\end{itemize}
Additionally, Tempus has several mechanisms which allow users to
insert application-specific algorithms into Tempus components (e.g.,
through observers and creating derived classes).
\subsection{Panzer}
The Panzer package provides global tools for finite element analysis. The package is targeted for large-scale parallel implicit PDEs using continuous and discontinuous compatible finite elements as well as control volume finite elements. Panzer enables the solution of nonlinear problems, by interfacing with several Trilinos linear and nonlinear solvers. It uses the Intrepid2 package for arbitrary order finite element bases. It computes derivatives and sensitivities with the Sacado automatic differentiation (AD) package. Panzer supports both Epetra and Tpetra data structures and achieves performance portability through the Kokkos programming model.
Panzer is designed for multiphysics systems. The assembly engine allows for different equation sets in each element block of the mesh and allows for mixed bases for degrees of freedom (DOF) within each element block (e.g. mixed HDiv and HCurl system for electromagnetics). The assembly tools can create a single fully coupled Jacobian matrix with all off-block dependencies (as a single Tpetra::CrsMatrix) or it can create a blocked system, grouping sets of DOFs into separate explicit matrices. This allows for physics-based block preconditioning strategies using the Teko block preconditioner package in Trilinos \cite{Bonilla2023,Cyr2016a}. The assembly process relies on the Phalanx package for efficient assembly of the multiphysics systems. Panzer additionally can wrap the assembly in a Thyra::ModelEvaluator, providing direct interfaces to the linear and nonlinear analysis packages in Trilinos.
Panzer can provide low level utilities for application codes to build on, or can be used as a high level application framework. Important capabilities include the following:
\begin{itemize}
\item \emph{DOF Manager:} Panzer provides a stand-alone DOF manager class in the dof-mgr subpackage. Given a list of DOFs and their corresponding basis and element blocks, the DOF manager can provide the mapping from DOFs on the mesh entities to the entries in linear algebra objects such as a residual vector or Jacobian matrix. The DOF manager can return the objects required to build distributed Tpetra and Epetra maps and graphs for both uniquely owned global indices and ghosted indices used during assembly. It provides the local indexing used during assembly as well. The DOF manager contains a mesh abstraction called a connection manager that provides information about mesh connectivity, global numbering of mesh topological entities and element block groups. It is designed to support any underlying mesh database, allowing applications to use the DOF manager with any finite element application.
\item \emph{STK\_Interface:} Panzer contains a concrete implementation of the connection manager API for the STK mesh database package in Trilinos. The STK\_Interface object wraps a STK mesh database. It provides a simple interface for accessing global indices on the mesh and can be used to read/write associated solution data to the database. It additionally supports SEACAS use for writing mesh data to disk. This STK\_Interface also includes support for periodic boundary conditions. This capability can match topological entities on periodic parallel distributed faces. Once matched, the DOFs are unified to enforce periodicity for the DOF manager. This capability is in the adapters-stk subpackage.
\item \emph{Linear Object Factory:} Panzer provides a linear object factory and linear object container designed to support parallel distributed assembly. It supports both Epetra and Tpetra objects. The returned containers hold the linear algebra objects for either a uniquely owned DOF map used for solving the linear system or a ghosted version of the linear objects that can be used for assembly. The containers support export and import operations between the unique and ghosted containers. These are used to simplify the assembly process and abstract the underlying linear objects (i.e. Epetra or Tpetra). The linear object factory can also create DOF gather and scatter functors for the corresponding (and possibly blocked) matrices. The gather and scatter operations are specialized on the assembly types, including residuals, Jacobians, Tangents and Hessians. Hessians are only supported for Epetra at the moment but will be expanded to Tpetra as needed. The capability is found in the disc-fe subpackage.
\item \emph{Worksets:} The Panzer assembly process provides workset containers to hold static data that can be reused in finite element computations. For example, when using a static mesh, these containers could hold the Jacobian, Jacobian inverse and all finite element basis values at the quadrature points. The workset concept breaks the local elements on an MPI process into smaller blocks of uniform computations that can be dispatched to a CPU or GPU. The workset helps to control total memory use for temporaries in the Phalanx directed acyclic graph. This capability is in the disc-fe subpackage.
\item \emph{Assembly Kernels:} Panzer provides a number of assembly kernels that are optimized for use with Sacado AD data types. When assembling on GPUs, the assembly kernels were found to be an order of magnitude faster if the AD evaluation was parallelized over the internal derivative dimension \cite{phipps2022automatic}. This required using kernels with a Kokkos::TeamPolicy for finite element assembly where the lowest level of parallelism is used internally by the Sacado AD type. When building Panzer for GPUs, the DFad hierarchic parallelism flag in Sacado should be enabled for kernel performance. The kernels can be found in the disc-fe subpackage.
\item Panzer provides an example miniapp for implicit electromagnetics that is used for benchmarking linear solver performance and for acceptance testing of new high performance computing systems. It demonstrates an HCurl-HDiv formulation for the electric and magnetic fields. This is found in the MiniEM subpackage.
\end{itemize}
The Panzer package is intended to provide both low and high level tools for implicit finite elements discretizations. The high level tools aggregate many Trilinos discretization and solver packages. While the high level tools can be used as a rapid prototyping environment, the front end is fairly complex to set up as opposed to true rapid prototyping frameworks such as DealII \cite{dealII95}, FEniCSx \cite{BarattaEtal2023} and Firedrake \cite{FiredrakeUserManual}. Panzer is not user friendly in this regard, however the examples and miniapps are a good starting point and can be quickly adapted to other physics. A number of performance portable applications are using Panzer tools at different levels of adoption, these include Albany \cite{Salinger2016}, Charon \cite{CharonUsersManual2020}, Drekar \cite{Crockatt2022,Miller2019,Shadid2016mhd} and EMPIRE \cite{BettencourtBrownEtAl2021_EmpirePic}.
\subsection{Compadre}
The Compadre package provides tools for the approximation of linear operators (including point evaluation and derivatives) given the location of samples of a function over an unstructured cloud of points. The resulting stencils, when applied directly to samples of the function at these locations, provides an approximation of the linear operator acting on the function at the point(s) queried. This is useful for meshed and meshless data transfer applications. Samples of the function at the specified locations can also be viewed as unknowns, in which case the solution returned by Compadre can be used as a stencil for meshless discretization of PDEs.
The package uses generalized moving least squares (GMLS) for approximating functionals.
Classic moving least squares consists of approximating continuous target functions avaluated over a cloud of points by solving weigthed least square problems to locally (within a neighborood of the evaluation point) approximate the function with polynomials. The generalized approach implemented in Compadre allows to possibly replace the evaluation of the target function at points with \emph{sampling} functionals of such function (e.g. integrals of the function over local domains), and, instead of approximating the target function, to approximate a \emph{target} operator (e.g. gradient, curl, divergence) of such function. Another generalization involves using basis functions other than polynomials. Detailed description of GMLS can be found in \cite{mirzaei2012generalized,wendland2004scattered}. Future plans include the implementation of other meshless methods like radial basis functions.
%Consider $\phi$ of function class $\mathbf{V}$. Consider a collection of samples $\Lambda = \left\{\lambda_i(\phi)\right\}_{i=1}^{N}$ corresponding to a quasi-uniform\cite{wendland2004scattered} collection of data sites $\mathbf{X}_h = \left\{\mathbf{x}_i\right\} \subset \mathbb{R}^d$ characterized by fill distance $h$. To approximate a given linear target functional $\tau_{\tilde{x}}$ associated with a target site $\tilde{x}$, we seek a reconstruction $p \in \mathbf{V}_h$, where $\mathbf{V}_h \subset \mathbf{V}$ is a finite dimensional space chosen to provide good approximation properties, with basis $\mathbf{P}=\left\{P\right\}_{i=1}^{dim(V_h)}$. We perform this reconstruction in the following weighted $\ell_2$ sense:
%\begin{equation}
%\label{gmls}
%p = \underset{{q \in \mathbf{V}_h}}\argmin \sum_{i=1}^N \left( \lambda_i(\phi) -\lambda_i(q) \right)^2 \omega(\lambda_i,\tau_{\tilde{x}})
%\end{equation}
%where $\omega$ is a locally supported positive function. Compadre offers several choices for the weighting kernel $\omega = \Phi(|\tilde{x}-\mathbf{x}_i|)$, where $|\cdot|$ denotes the Euclidean norm.
%With the optimal reconstruction $p$ computed, the target functional is approximated via $\tau_{\tilde{x}} (\phi) \approx \tau^h_{\tilde{x}} (\phi) := \tau_{\tilde{x}} (p)$. As an unconstrained $\ell_2$-optimization problem, this process admits the explicit form
%\begin{equation}
%\label{discreteTarget}
%\tau^h_{\tilde{x}}(\phi) = \tau_{\tilde{x}}(\mathbf{P})^\intercal \left(\Lambda(\mathbf{P})^\intercal \mathbf{W} \Lambda(\mathbf{P})\right)^{-1} \Lambda(\mathbf{P})^\intercal \mathbf{W} \Lambda(\phi),
%\end{equation}
%where we denote:
%\begin{itemize}
% \item $\tau_{\tilde{x}}(\mathbf{P}) \in \mathbb{R}^{dim(V_h)}$ is a vector with components consisting of the target functional applied to each basis function.
% \item $\mathbf{W} \in \mathbb{R}^{N \times N}$ is a diagonal matrix with diagonal entries consisting of $\left\{\omega(\lambda_i,\tau_{\tilde{x}})\right\}_{i=1,...,N}$.
% \item $\Lambda(\mathbf{P}) \in \mathbb{R}^{N \times dim(V_h)}$ is a rectangular matrix whose $(i,j)$ entry corresponds to the application of the $i^{th}$ sampling functional applied to the $j^{th}$ basis function.
% \item $\Lambda(\phi) \in \mathbb{R}^N$ is a vector consisting of the $N$ samples of the function $\phi$.
%\end{itemize}
%We note that by taking the contraction of the tensors appearing in \eqref{discreteTarget} and exploiting the compact support of $\omega$, we may interpret the output of the GMLS process as a finite difference-like stencil of the form
%\begin{equation}
%\tau^h_{\tilde{x}}(\phi) = \sum_{\mathbf{x}_i \in B^\epsilon(\tilde{x})} \alpha_i \lambda_i(\phi),
%\end{equation}
%where $B^\epsilon(\tilde{x})$ denotes the $\epsilon$-ball neighborhood of the target site $\tilde{x}$. Therefore, GMLS admits an interpretation as an automated process for generating generalized finite difference methods on unstructured point clouds. The computational cost of solving the GMLS problem amounts to inverting a small linear system which may be assembled using only information from neighbors within the support of $\omega$, and construction of such stencils across the entire domain is embarrassingly parallel.
Compadre allows users to control the weighting kernel, degree of basis functions, the sampling functionals, and the target operator; this allows control over smoothness of the reconstruction, locality of the resulting stencil, order of accuracy of the reconstruction (assuming regularity of the function embedded in the point cloud data), choice of what the sampled data or degrees of freedom represent, and linear operator action, respectively.
Selecting point evaluations for the sampling functionals and the target operator provides a traditional moving least squares reconstruction. As an example of a more exotic choice, it is possible to use an average vector normal integral over edges as the sampling functionals and a cell average integral as the target operator, enabling recovery of functions embedded in a Raviart-Thomas type representation and transferring them to a basis consistent with a finite-volume scheme.
While Compadre supports full space reconstruction in 1-3D, there is also additional support for select sampling functionals and target operators on 1D smooth manifolds embedded in 2D or 2D smooth manifolds embedded in 3D. Reconstruction on a manifold is done through an on-the-fly Principal Component Analysis calculation to determine principal directions tangent to the manifold. The curvature of the manifold is calculated through a reconstruction of the tangent and normal components to the calculated tangent plane, and the final function reconstruction is performed in the local chart. Utilities in the package handle mappings between local computed charts and the ambient higher-dimensional space.
Compadre's stencil generation involves independent problems to be solved in parallel at the team level with loops over the thread and vector level within each problem. This hierarchical parallelism is achieved with performance portability by using the Kokkos programming model and leveraging the batched QR with pivoting algorithm implemented in Kokkos Kernels.