data_services.tex

\todo{Describe performance portability}

\subsection{Kokkos Kernels}\label{subsec:kk}
Kokkos Kernels~\cite{rajamanickam2021kokkoskernels} is part of the Kokkos ecosystem
~\cite{trott2021kokkos} and provides node local implementations of mathematical kernels
widely used across packages in Trilinos. As a member of the Kokkos
ecosystem, Kokkos Kernels is tightly integrated on Kokkos features and aims at
delivering performance portable algorithms across major CPU and GPU based HPC systems.
Due to its node local nature, Kokkos Kernels does not rely on MPI or other communication
libraries unlike numerous other packages in Trilinos.

The implementation of Kokkos Kernels algorithms leverage the hierarchical parallelism
exposed by the Kokkos library~\cite{kim2017designing} and increasingly provides coverage
for stream callable kernels. To ensure flexibility for the distributed libraries that
might call its algorithms, Kokkos Kernels provides thread safe and asynchronous
implementations for most of its kernels. Kokkos Kernels also serves as a major point of
integration for vendor optimized libraries such as cuBLAS, cuSPARSE, rocBLAS, rocSPARSE,
MKL, ARMpl and others.

The capabilities that Kokkos Kernels provides can be divided in four major categories:
1. BLAS algorithms, 2. sparse linear algebra and preconditioners, 3. graph algorithms, and
4. batched dense and sparse linear algebra~\cite{liegeois2023performance}. The main
points of integration of Kokkos Kernels in Trilinos are Tpetra for the dense and sparse
linear algebra capabilities, Ifpack2 for the preconditioners and batched algorithms,
the multigrid package MueLu that relies on these features both directly and indirectly
as well as on some specialized algorithms such as graph
coloring/coarsening~\cite{kelley2022parallel} and fused Jacobi-SpGEMM (generalized sparse matrix matrix multiply) kernels.

Similarly to the Kokkos library, Kokkos Kernels is developed in its own GitHub
repository\footnote{https://github.com/kokkos/kokkos-kernels} outside of the Trilinos
GitHub repository. Every version of the library is integrated and tested in Trilinos
as part of the Kokkos ecosystem release process. Additional information on Kokkos
Kernels capabilities can be found here~\cite{deveci2018multithreaded,wolf2017fast}.


\subsection{Tpetra}\label{subsec:tpetra}
Tpetra \cite{hoemmen2015tpetra} provides the distributed-memory
infrastructure for sparse linear algebra computations.  It implements
distributed-memory linear algebra objects, such as sparse graphs,
sparse matrices and dense vectors, using Kokkos for local data
storage.  Distributed-memory sparse linear algebra operations, such as
a sparse matrix-vector product, are implemented through on-node calls
to Kokkos Kernels and inter-node MPI communication.   Tpetra features
include:
\begin{itemize}
\item \textit{Maps} --- distributions of objects over MPI ranks.
\item \textit{(Multi)Vectors} --- storage of dense vectors or collections of
vectors (multivectors) and associated BLAS-1 like kernels (e.g., dot
products, norms, scaling, vector addition, pointwise vector
multiplication) as well as tall skinny QR (TSQR)  factorization for multivectors.
\item \textit{Import/export} --- moving vector, graph and matrix data
between different distributions (maps).  This is key for performing
halo/boundary exchanges, as well as other kernels such as
sparse matrix-matrix multiplication.
\item \textit{Sparse graphs} --- in compressed sparse row (CSR)
format.  Graphs also include import/export objects for use in
halo/boundary exchanges associate with the graph.
\item \textit{Sparse matrices} --- in compressed sparse row (CSR) and
block compressed sparse row (BSR) formats.  Associated kernels include
sparse matrix vector product (SPMV), sparse matrix-matrix
multiplication (SPGEMM) and triple-product, sparse matrix-matrix addition, sparse matrix transpose, diagonal extraction,
Frobenius norm calculation, and row/column scaling.
\end{itemize}