From 4e673d5817429cd730096fa231e0195af03e9db1 Mon Sep 17 00:00:00 2001 From: "Cosmin G. Petra" Date: Sun, 22 Sep 2024 16:44:09 -0700 Subject: [PATCH] update user manual with checkpointing --- doc/src/sections/solver_options.tex | 10 +++++----- doc/src/techrep_main.tex | 27 +++++++++++++++++++++++++-- 2 files changed, 30 insertions(+), 7 deletions(-) diff --git a/doc/src/sections/solver_options.tex b/doc/src/sections/solver_options.tex index c34e1e29..a2a73be4 100755 --- a/doc/src/sections/solver_options.tex +++ b/doc/src/sections/solver_options.tex @@ -423,16 +423,16 @@ \subsubsection{Problem preprocessing} \medskip -\subsubsection{Checkpointing of the solver state and restarting} -\Hi can save/load its internal state to/from disk. This can be helphul when running a job on a cluster that enforces limits on the job's running time. This functionality is currently available only for the quasi-Newton algorithm. The checkpointing is done using Axom's scalable Sidre data manager and IO (see \url{https://axom.readthedocs.io/en/develop/axom/sidre/docs/sphinx/index.html}) and requires an Axom-enabled build (use ``-DHIOP_USE_AXOM=ON'' with cmake). +\subsubsection{Checkpointing of the solver state and restarting}\label{sec:checkpoint} +As detailed in Section~\ref{sec:checkpoint_API}, \Hi can save/load its internal state to/from disk. All the options in this section require an Axom-enabled build (use ``-DHIOP\_USE\_AXOM=ON'' with cmake) and are supported only by the quasi-Newton IPM solver (\texttt{hiopAlgFilterIPMQuasiNewton} class) for the \texttt{hiopInterfaceDenseConstraints} NLP formulation/interface. \noindent \textbf{checkpoint\_save}: Save state of NLP solver to file indicated by value of option ``checkpoint\_file''. String values ``yes'' or ``no'', default ``no''. -\noindent \textbf{checkpoint\_load\_on\_start} On (re)start the NLP solver will load checkpoint file specified by ``checkpoint_file`` option. String values ``yes'' or ``no'', default ``no''. +\noindent \textbf{checkpoint\_load\_on\_start} On (re)start the NLP solver will load checkpoint file specified by ``checkpoint\_file`` option. String values ``yes'' or ``no'', default ``no''. -\noindent \textbf{checkpoint\_file} Path to checkpoint file to load from or save to. If present, the character ``\#'' is replaced with the iteration number at which the checkpointing is saved (but \textit{not} when loaded). \Hi adds a ``.root'' extension internally if the value of the option is a directory. If this option is not specified and loading or saving checkpoints is enabled, \Hi will use a file named ``hiop_state_chk''. +\noindent \textbf{checkpoint\_file} Path to checkpoint file to load from or save to. If present, the character ``\#'' is replaced with the iteration number at which the checkpointing is saved (but \textit{not} when loaded). \Hi adds a ``.root'' extension internally if the value of the option is a directory. If this option is not specified and loading or saving checkpoints is enabled, \Hi will use a file named ``hiop\_state\_chk''. -\noindent \textbf{checkpoint\_save\_every\_N\_iter} Iteration frequency of saving checkpoints to disk if ``checkpoint_save'' is ``yes''. Takes positive integer values with a default value $10$. +\noindent \textbf{checkpoint\_save\_every\_N\_iter} Iteration frequency of saving checkpoints to disk if ``checkpoint\_save'' is ``yes''. Takes positive integer values with a default value $10$. \subsubsection{Miscellaneous options} diff --git a/doc/src/techrep_main.tex b/doc/src/techrep_main.tex index 2edb3f11..f16f36fe 100755 --- a/doc/src/techrep_main.tex +++ b/doc/src/techrep_main.tex @@ -133,7 +133,7 @@ \vspace{3cm} {\huge\bfseries \Hi\ -- User Guide} \\[14pt] - {\large\bfseries version 1.03} + {\large\bfseries version 1.1.0} \vspace{3cm} @@ -155,7 +155,7 @@ \vspace{4.75cm} \textcolor{violet}{{\large\bfseries Oct 15, 2017} \\ -{\large\bfseries Updated Feb 5, 2024}} +{\large\bfseries Updated Sept 22, 2024}} \vspace{0.75cm} @@ -474,6 +474,29 @@ \subsubsection{Calling \Hi for a \texttt{hiopInterfaceDenseConstraints} formulat \end{lstlisting} The standalone drivers \texttt{NlpDenseConsEx1}, \texttt{NlpDenseConsEx2}, and \texttt{NlpDenseConsEx3} inside directory \texttt{src/Drivers/} under the \Hi's root directory contain more detailed examples of the use of \Hi. +\subsubsection{Checkpointing}\label{sec:checkpoint_API} +File checkpointing is available for \Hi's quasi-Newton IPM solver, which is used exclusively to solve \texttt{hiopInterfaceDenseConstraints} formulation. This can be helpful when running a job on +a cluster that enforces limits on the job’s running time. +Later, this feature will also be provided for other solvers, such as the Newton IPM (used exclusively with sparse NLP) and HiOp-PriDec. + +The checkpointing I/O is based on Axom's scalable Sidre data manager (see \url{https://axom.readthedocs.io/en/develop/axom/sidre/docs/sphinx/index.html} for more information) and, thus, requires an Axom-enabled build (use ``-DHIOP\_USE\_AXOM=ON'' with cmake). + +There are two ways to use \Hi's checkpointing. The first is via the quasi-Newton solver's API, namely, the methods +\begin{lstlisting} +void load_state_from_sidre_group(const ::axom::sidre::Group& group); +void save_state_to_sidre_group(::axom::sidre::Group& group); +\end{lstlisting} +of \texttt{hiopAlgFilterIPMQuasiNewton} solver class. New Sidre views will be created (or reused) within the group passed as argument to load / save state variables of the quasi-Newton solver. Alternatively, \texttt{hiopAlgFilterIPMQuasiNewton} solver class offers similar methods to work directly with a file, namely, +\begin{lstlisting} +bool load_state_from_file(const ::std::string& path) noexcept; +bool save_state_to_file(const ::std::string& path) noexcept; +\end{lstlisting} +These two methods will create the Sidre group internally and checkpoint to/from it using the first two methods. + +A second avenue to checkpoint is via user options. This is detailed in Section~\ref{sec:checkpoint}. + + \warningcp{Note:} A couple of particularities stemming from the use of Sidre must be acknowledged. First, a checkpoint file should be loaded using HiOp with the same number of MPI ranks as when it was saved. Second, checkpointing is not available for non-MPI builds due to Axom having MPI as a dependency. Finally, when loading from or saving to a checkpoint file, the sizes of the file's variables (Sidre views) must match the sizes of the HiOp variables to which the data is loaded or saved, meaning \Hi will throw an exception if an existing file is (re)used to load or save a algorithm state for a problem that changed sizes since the file was created. + %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %% NLP Sparse