From 335392870d34713f347160d3ccc5b0febeeda322 Mon Sep 17 00:00:00 2001
From: kanekosh <shugok@umich.edu>
Date: Fri, 20 Oct 2023 16:46:25 -0400
Subject: [PATCH 1/5] added poem 93

---
 POEM_093.md                  | 314 +++++++++++++++++++++
 POEM_093/direct_custom.py    | 511 ++++++++++++++++++++++++++++++++++
 POEM_093/example1.py         | 338 ++++++++++++++++++++++
 POEM_093/example2.py         | 312 +++++++++++++++++++++
 POEM_093/petsc_ksp_custom.py | 524 +++++++++++++++++++++++++++++++++++
 5 files changed, 1999 insertions(+)
 create mode 100644 POEM_093.md
 create mode 100644 POEM_093/direct_custom.py
 create mode 100644 POEM_093/example1.py
 create mode 100644 POEM_093/example2.py
 create mode 100644 POEM_093/petsc_ksp_custom.py

diff --git a/POEM_093.md b/POEM_093.md
new file mode 100644
index 00000000..a95fa931
--- /dev/null
+++ b/POEM_093.md
@@ -0,0 +1,314 @@
+POEM ID: 093  
+Title: Linear solution caching  
+authors: kanekosh (Shugo Kaneko) and anilyil (Anil Yildirim)  
+Competing POEMs:   
+Related POEMs:  
+Associated implementation PR:  
+
+Status:
+
+- [x] Active
+- [ ] Requesting decision
+- [ ] Accepted
+- [ ] Rejected
+- [ ] Integrated
+
+## Motivation
+When computing derivatives with a hierarchical linear solver, OpenMDAO may re-solve the same subsystem-level linear system multiple times, which is redundant.
+Examples of such problems will be provided later.  
+This POEM proposes to remove this redundancy and accelerate the total derivatives computation by caching the subsystem-level linear solutions.
+
+## Description
+We suggest caching the right-hand side (RHS) vector `b` and the corresponding linear system solution `x` within the subsystem's linear solver.
+The caching can be optional; by default, it would not cache the solution to  avoid unnecessary overhead of saving/checking the cache.  
+When the user enabled caching, the linear solver checks if the given RHS vector `b` is parallel to a cached vector `b_cache` before solving the linear system.
+If they are parallel, it returns the linear system solution by `x = x_cache * (|b| / |b_cache|)` without solving the linear system.
+Otherwise, it solves the linear solver and add the solution and corresponding RHS vector to the cache.
+
+## API change proposal
+
+**1. Caching all inputs or outputs of a subsystem**  
+A user may enable the caching when adding a linear solver to a subsystem.  
+For example,
+
+```python
+subsystem1.linear_solver = om.PETScKrylov(use_cache=True)
+```
+would enable caching for `subsystem1`.
+This does not change the behavior of the current models if we set the default to `use_cache=False`.
+
+**2. Caching specific inputs or outputs**   
+Furthermore, enabling the caching only for the user-specified inputs (in forward mode) or outputs (in reverse mode) can further reduce the computational overhead of the caching.  
+For example,
+```python
+subsystem.use_cache(mode='rev', cache_inputs=None, cache_outputs=['Lift'])
+```
+would enable caching only for the `Lift` output in the reverse mode, but not for other outputs.
+
+## Prototype implementation
+We implemented a prototype of the linear solution caching in a custom PETScKrylov solver and Direct solver, which are available [here](POEM_093/).
+
+Note that this prototype caches all the inputs or outputs of a subsystem (i.e., it implements the proposal 1 above).
+The implementation of the caching for specific inputs or outputs (proposal 2) would require a different approach (which we don't have a good idea of how it can be done).  
+Also note that this prototype does not implement the cache resetting, which should be done every optimization iterations.
+
+Here, we explain how the caching would be implemented for a Krylov solver.
+It works mostly the same for a Direct solver.
+The complete custom solver file is available [here](POEM_093/petsc_ksp_custom.py).
+
+First, we add a new option for caching:
+```python
+def _declare_options(self):
+    """
+    Declare options before kwargs are processed in the init method.
+    """
+    super()._declare_options()
+
+    self.options.declare('ksp_type', default='fgmres', values=KSP_TYPES,
+                            desc="KSP algorithm to use. Default is 'fgmres'.")
+
+    self.options.declare('restart', default=1000, types=int,
+                            desc='Number of iterations between restarts. Larger values increase '
+                            'iteration cost, but may be necessary for convergence')
+
+    self.options.declare('precon_side', default='right', values=['left', 'right'],
+                            desc='Preconditioner side, default is right.')
+
+    # --- option for caching. Default: no cache ---
+    self.options.declare('use_cache', types=bool, default=False)
+
+    # changing the default maxiter from the base class
+    self.options['maxiter'] = 100
+```
+
+Then, the solver initializes the cache in `_linearize` (or in `__init__`):
+```python
+def _linearize(self):
+    """
+    Perform any required linearization operations such as matrix factorization.
+    """
+    print('*** linearize in Krylov ***')
+    if self.precon is not None:
+        self.precon._linearize()
+
+    # --- initialize cache ---
+    if self.options['use_cache']:
+        self._rhs_cache_list = []
+        self._sol_cache_list = []
+```
+
+In the `solve` method, it checks and reuses the cached solution if applicable.
+Otherwise, it adds the RHS vector and the solution to the cache at the end.
+```python
+def solve(self, mode, rel_systems=None):
+    """
+    Solve the linear system for the problem in self._system().
+
+    The full solution vector is returned.
+
+    Parameters
+    ----------
+    mode : str
+        Derivative mode, can be 'fwd' or 'rev'.
+    rel_systems : set of str
+        Names of systems relevant to the current solve.
+    """
+    self._rel_systems = rel_systems
+    self._mode = mode
+
+    system = self._system()
+    options = self.options
+
+    if system.under_complex_step:
+        raise RuntimeError('{}: PETScKrylov solver is not supported under '
+                            'complex step.'.format(self.msginfo))
+
+    maxiter = options['maxiter']
+    atol = options['atol']
+    rtol = options['rtol']
+
+    # assign x and b vectors based on mode
+    if self._mode == 'fwd':
+        x_vec = system._doutputs
+        b_vec = system._dresiduals
+    else:  # rev
+        x_vec = system._dresiduals
+        b_vec = system._doutputs
+
+    # create numpy arrays to interface with PETSc
+    sol_array = x_vec.asarray(copy=True)
+    rhs_array = b_vec.asarray(copy=True)
+
+    # --- check if we can reuse the cached solutions ---
+    if self.options['use_cache']:
+        for i, rhs_cache in enumerate(self._rhs_cache_list):   # loop over cached RHS vectors. NOTE: reverse order loop would be faster?
+            # Check if the RHS vector is the same as a cached vector. This part is not necessary, but is less expensive than checking if two vectors are parallel.
+            if np.allclose(rhs_array, rhs_cache, rtol=1e-12, atol=1e-12):
+                x_vec.set_val(self._sol_cache_list[i])
+                print('*** use caching in Krylov with scaler = 1', 'time =', time.time() - time0, '***')
+                return
+            # Check if the RHS vector is equal to -1 * cached vector.
+            elif np.allclose(rhs_array, -rhs_cache, rtol=1e-12, atol=1e-12):
+                x_vec.set_val(-self._sol_cache_list[i])
+                print('*** use caching in Krylov with scaler = -1', 'time =', time.time() - time0, '***')
+                return
+            
+            # Check if the RHS vector and a cached vector are parallel
+            # NOTE: the following parallel vector check may be inaccurate for some cases. maybe should tighten the tolerance?
+            dot_product = np.dot(rhs_array, rhs_cache)
+            norm1 = np.linalg.norm(rhs_array)
+            norm2 = np.linalg.norm(rhs_cache)
+            if np.isclose(abs(dot_product), norm1 * norm2, rtol=1e-12, atol=1e-12):
+                # two vectors are parallel, thus we can use the cache.
+                scaler = dot_product / norm2**2
+                x_vec.set_val(self._sol_cache_list[i] * scaler)
+                print('*** use caching in Krylov with scaler =', scaler, ' | time =', time.time() - time0, '***')
+                return
+    
+    # create PETSc vectors from numpy arrays
+    sol_petsc_vec = PETSc.Vec().createWithArray(sol_array, comm=system.comm)
+    rhs_petsc_vec = PETSc.Vec().createWithArray(rhs_array, comm=system.comm)
+
+    # run PETSc solver
+    self._iter_count = 0
+    ksp = self._get_ksp_solver(system)
+    ksp.setTolerances(max_it=maxiter, atol=atol, rtol=rtol)
+    ksp.solve(rhs_petsc_vec, sol_petsc_vec)
+
+    # stuff the result into the x vector
+    x_vec.set_val(sol_array)
+
+    sol_petsc_vec = rhs_petsc_vec = None
+
+    # --- append the current solution to the cache ---
+    if self.options['use_cache']:
+        self._rhs_cache_list.append(rhs_array.copy())
+        self._sol_cache_list.append(sol_array.copy())
+```
+
+## Examples of Caching: 
+### 2-point Aerostructural optimization
+Here is an [OpenAeroStruct multipoint optimization example](POEM_093/example1.py) to demonstrate the solution caching.
+We consider the following optimization problem:
+```
+minimize:   CD0 + CD1
+subject to: CL0 = 0.6
+            CL1 - CL0 = 0.2
+```
+
+For the second operating point, we impose the CL constraint in a way that it also depends on the first point's CL.
+To compute the total derivatives, we use PETScKrylov solver for aerostructural coupling.
+The solver outputs of `compute_totals()` look as follows:
+```
+# dCD0/dx
+LN: PETScKrylov 0 ; 0.00299022306 1
+LN: PETScKrylov 1 ; 8.49989958e-08 2.84256372e-05
+LN: PETScKrylov 2 ; 7.30820654e-11 2.44403391e-08
+
+# dCD1/dx
+LN: PETScKrylov 0 ; 0.00333369362 1
+LN: PETScKrylov 1 ; 8.50541382e-08 2.55134838e-05
+LN: PETScKrylov 2 ; 8.31048387e-11 2.49287572e-08
+
+# dCL0/dx
+LN: PETScKrylov 0 ; 0.00951495633 1
+LN: PETScKrylov 1 ; 2.65013995e-06 0.000278523607
+LN: PETScKrylov 2 ; 1.22310119e-09 1.28545118e-07
+LN: PETScKrylov 3 ; 1.82688692e-11 1.92001608e-09
+
+# dCL1/dx
+LN: PETScKrylov 0 ; 0.00952340656 1
+LN: PETScKrylov 1 ; 2.64776928e-06 0.000278027538
+LN: PETScKrylov 2 ; 1.04735578e-09 1.0997701e-07
+LN: PETScKrylov 3 ; 1.19985619e-11 1.25990231e-09
+
+# dCL0/dx again - identical history to the previous dCL0/dx solve.
+LN: PETScKrylov 0 ; 0.00951495633 1
+LN: PETScKrylov 1 ; 2.65013995e-06 0.000278523607
+LN: PETScKrylov 2 ; 1.22310119e-09 1.28545118e-07
+LN: PETScKrylov 3 ; 1.82688692e-11 1.92001608e-09
+
+# verification print
+Derivatives of CL_diff w.r.t. twist_cp = [[-2.17863550e-05 -3.79972743e-05 -4.59107509e-05 -4.77229851e-05]]
+```
+We can observe that there were 5 aerostructural adjoint solves: 2 for objective, 1 for the first CL constraint, and 2 for the second CL constraint.
+However, there are duplicated adjoint solve for `dCL0/dx` because `CL0` is used for both the first and second CL constraints.
+
+This duplicated adjoint solves can be avoided using the solution caching.
+Using [the prototype solver](POEM_093/petsc_ksp_custom.py), we get the following solver outputs:
+```
+# dCD0/dx
+LN: PETScKrylov 0 ; 0.00299022306 1
+LN: PETScKrylov 1 ; 8.49989958e-08 2.84256372e-05
+LN: PETScKrylov 2 ; 7.30820654e-11 2.44403391e-08
+*** solved in Krylov | time = 8.14E-03 s ***
+
+# dCD1/dx
+LN: PETScKrylov 0 ; 0.00333369362 1
+LN: PETScKrylov 1 ; 8.50541382e-08 2.55134838e-05
+LN: PETScKrylov 2 ; 8.31048387e-11 2.49287572e-08
+*** solved in Krylov | time = 7.97E-03 s ***
+
+# dCL0/dx
+LN: PETScKrylov 0 ; 0.00951495633 1
+LN: PETScKrylov 1 ; 2.65013995e-06 0.000278523607
+LN: PETScKrylov 2 ; 1.22310119e-09 1.28545118e-07
+LN: PETScKrylov 3 ; 1.82688692e-11 1.92001608e-09
+*** solved in Krylov | time = 1.00E-02 s ***
+
+# dCL1/dx
+LN: PETScKrylov 0 ; 0.00952340656 1
+LN: PETScKrylov 1 ; 2.64776928e-06 0.000278027538
+LN: PETScKrylov 2 ; 1.04735578e-09 1.0997701e-07
+LN: PETScKrylov 3 ; 1.19985619e-11 1.25990231e-09
+*** solved in Krylov | time = 1.01E-02 s ***
+
+# dCL0/dx again - now use cached solution without running the Krylov solver
+*** use caching in Krylov with scaler = -1 | time = 6.11E-04 s ***
+
+# verification print
+Derivatives of CL_diff w.r.t. twist_cp = [[-2.17863550e-05 -3.79972743e-05 -4.59107509e-05 -4.77229851e-05]]
+```
+The caching removed the redundant `dCL0/dx` solve but still gave the identical total derivatives (verified by printed CL_diff w.r.t. twist_cp).
+
+### More practical multipoint problem
+In the above example, we formulate the second CL constraint in a weird way to demonstrate the caching.
+However, the above model structure represents practical problems.
+
+As an example, we again consider [another multipoint problem](POEM_093/example2.py) (cruise point + manuever point), but now with more practical objective and constraints:  
+```
+minimize:   fuel burn  
+subject to: L = W at cruise point
+            L = W at maneuver point
+```
+
+The `L=W` constraints depend on the fuel burn, which is computed using the cruise drag.
+Therefore, `L=W` constraint at the maneuver point depends on the fuel burn (or cruise drag) of the cruise point.
+This introduces the redundant adjoint solves, i.e., the adjoint for maneuver point `L=W` calls the adjoint for cruise point.
+
+The linear solution caching can avoid this redundancy.
+The following is the linear solver outputs with caching:
+```
+# adjoint for fuelburn (cruise point OAS adjoint)
+LN: PETScKrylov 0 ; 244.940409 1
+LN: PETScKrylov 1 ; 0.00367094295 1.49870859e-05
+LN: PETScKrylov 2 ; 5.5435276e-06 2.2632148e-08
+LN: PETScKrylov 3 ; 2.08827096e-08 8.5256286e-11
+*** solved in Krylov | time = 1.22E-02 s ***
+
+# adjoint for L=W at cruise point
+LN: PETScKrylov 0 ; 0.00787944105 1
+LN: PETScKrylov 1 ; 4.18719356e-06 0.000531407436
+LN: PETScKrylov 2 ; 2.89301257e-09 3.67159618e-07
+LN: PETScKrylov 3 ; 1.68153368e-12 2.13407736e-10
+*** solved in Krylov | time = 9.81E-03 s ***
+
+# adjoint for L=W at maneuver point. This uses cache from the cruise point adjoint
+LN: PETScKrylov 0 ; 1.70385305e-05 1
+LN: PETScKrylov 1 ; 1.50491383e-06 0.0883241564
+LN: PETScKrylov 2 ; 9.45843606e-10 5.55120411e-05
+LN: PETScKrylov 3 ; 5.54498891e-13 3.25438213e-08
+*** solved in Krylov | time = 1.17E-02 s ***
+*** use caching in Krylov with scaler = 1.19E-05 | time = 4.44E-04 s ***
+```
+
diff --git a/POEM_093/direct_custom.py b/POEM_093/direct_custom.py
new file mode 100644
index 00000000..7439435f
--- /dev/null
+++ b/POEM_093/direct_custom.py
@@ -0,0 +1,511 @@
+"""LinearSolver that uses linalg.solve or LU factor/solve.
+Modified to use solution caching.
+The modifications can be found by searching for "EDIT-CACHE" comments in this file.
+"""
+
+import warnings
+
+import numpy as np
+import scipy.linalg
+import scipy.sparse.linalg
+from scipy.sparse import csc_matrix
+
+from openmdao.solvers.solver import LinearSolver
+from openmdao.matrices.dense_matrix import DenseMatrix
+from openmdao.utils.array_utils import identity_column_iter
+
+import time
+
+
+def index_to_varname(system, loc):
+    """
+    Given a matrix location, return the name of the variable associated with that index.
+
+    Parameters
+    ----------
+    system : <System>
+        System containing the Directsolver.
+    loc : int
+        Index of row or column.
+
+    Returns
+    -------
+    str
+        String containing variable absolute name (and promoted name if there is one) and index.
+    """
+    start = end = 0
+    varsizes = np.sum(system._owned_sizes, axis=0)
+    for i, name in enumerate(system._var_allprocs_abs2meta['output']):
+        end += varsizes[i]
+        if loc < end:
+            varname = system._var_allprocs_abs2prom['output'][name]
+            break
+        start = end
+
+    if varname == name:
+        name_string = "'{}' index {}.".format(varname, loc - start)
+    else:
+        name_string = "'{}' ('{}') index {}.".format(varname, name, loc - start)
+
+    return name_string
+
+
+def loc_to_error_msg(system, loc_txt, loc):
+    """
+    Given a matrix location, format a coherent error message when matrix is singular.
+
+    Parameters
+    ----------
+    system : <System>
+        System containing the Directsolver.
+    loc_txt : str
+        Either 'row' or 'col'.
+    loc : int
+        Index of row or column.
+
+    Returns
+    -------
+    str
+        New error string.
+    """
+    names = index_to_varname(system, loc)
+    msg = "Singular entry found in {} for {} associated with state/residual " + names
+    return msg.format(system.msginfo, loc_txt)
+
+
+def format_singular_error(system, matrix):
+    """
+    Format a coherent error message for any ill-conditioned mmatrix.
+
+    Parameters
+    ----------
+    system : <System>
+        System containing the Directsolver.
+    matrix : ndarray
+        Matrix of interest.
+
+    Returns
+    -------
+    str
+        New error string.
+    """
+    if scipy.sparse.issparse(matrix):
+        matrix = matrix.toarray()
+
+    if np.any(np.isnan(matrix)):
+        # There is a nan in the matrix.
+        return format_nan_error(system, matrix)
+
+    zero_rows = np.where(~matrix.any(axis=1))[0]
+    zero_cols = np.where(~matrix.any(axis=0))[0]
+    if zero_cols.size <= zero_rows.size:
+
+        if zero_rows.size == 0:
+            # In this case, some row is a linear combination of the other rows.
+
+            # SVD gives us some information that may help locate the source of the problem.
+            try:
+                u, _, _ = np.linalg.svd(matrix)
+
+            except Exception as err:
+                msg = f"Jacobian in '{system.pathname}' is not full rank, but OpenMDAO was " + \
+                    "not able to determine which rows or columns."
+                return msg
+
+            # Nonzero elements in the left singular vector show the rows that contribute strongly to
+            # the singular subspace. Note that sometimes extra rows/cols are included in the set,
+            # currently don't have a good way to pare them down.
+            tol = 1e-15
+            u_sing = np.abs(u[:, -1])
+            left_idx = np.where(u_sing > tol)[0]
+
+            msg = "Jacobian in '{}' is not full rank. The following set of states/residuals " + \
+                  "contains one or more equations that is a linear combination of the others: \n"
+
+            for loc in left_idx:
+                name = index_to_varname(system, loc)
+                msg += ' ' + name + '\n'
+
+            if len(left_idx) > 2:
+                msg += "Note that the problem may be in a single Component."
+
+            return msg.format(system.pathname)
+
+        loc_txt = "row"
+        loc = zero_rows[0]
+    else:
+        loc_txt = "column"
+        loc = zero_cols[0]
+
+    return loc_to_error_msg(system, loc_txt, loc)
+
+
+def format_nan_error(system, matrix):
+    """
+    Format a coherent error message when the matrix contains NaN.
+
+    Parameters
+    ----------
+    system : <System>
+        System containing the Directsolver.
+    matrix : ndarray
+        Matrix of interest.
+
+    Returns
+    -------
+    str
+        New error string.
+    """
+    # Because of how we built the matrix, a NaN in a comp causes the whole row to be NaN, so we
+    # need to associate each index with a variable.
+    varsizes = np.sum(system._owned_sizes, axis=0)
+
+    nanrows = np.zeros(matrix.shape[0], dtype=bool)
+    nanrows[np.where(np.isnan(matrix))[0]] = True
+
+    varnames = []
+    start = end = 0
+    for i, name in enumerate(system._var_allprocs_abs2meta['output']):
+        end += varsizes[i]
+        if np.any(nanrows[start:end]):
+            varnames.append("'%s'" % system._var_allprocs_abs2prom['output'][name])
+        start = end
+
+    msg = "NaN entries found in {} for rows associated with states/residuals [{}]."
+    return msg.format(system.msginfo, ', '.join(varnames))
+
+
+class DirectSolverCustom(LinearSolver):
+    """
+    LinearSolver that uses linalg.solve or LU factor/solve.
+
+    Parameters
+    ----------
+    **kwargs : dict
+        Options dictionary.
+    """
+
+    SOLVER = 'LN: Direct'
+
+    def _declare_options(self):
+        """
+        Declare options before kwargs are processed in the init method.
+        """
+        super()._declare_options()
+
+        self.options.declare('err_on_singular', types=bool, default=True,
+                             desc="Raise an error if LU decomposition is singular.")
+
+        # EDIT-CACHE
+        # --- option for caching. Default: no cache ---
+        self.options.declare('use_cache', types=bool, default=False)
+
+        # this solver does not iterate
+        self.options.undeclare("maxiter")
+        self.options.undeclare("err_on_non_converge")
+
+        self.options.undeclare("atol")
+        self.options.undeclare("rtol")
+
+        # Use an assembled jacobian by default.
+        self.options['assemble_jac'] = True
+
+    def _setup_solvers(self, system, depth):
+        """
+        Assign system instance, set depth, and optionally perform setup.
+
+        Parameters
+        ----------
+        system : <System>
+            pointer to the owning system.
+        depth : int
+            depth of the current system (already incremented).
+        """
+        super()._setup_solvers(system, depth)
+        self._disallow_distrib_solve()
+
+    def _linearize_children(self):
+        """
+        Return a flag that is True when we need to call linearize on our subsystems' solvers.
+
+        Returns
+        -------
+        boolean
+            Flag for indicating child linearization.
+        """
+        return False
+
+    def _build_mtx(self):
+        """
+        Assemble a Jacobian matrix by matrix-vector-product with columns of identity.
+
+        Returns
+        -------
+        ndarray
+            Jacobian matrix.
+        """
+        system = self._system()
+        bvec = system._dresiduals
+        xvec = system._doutputs
+
+        # First make a backup of the vectors
+        b_data = bvec.asarray(copy=True)
+        x_data = xvec.asarray(copy=True)
+
+        nmtx = x_data.size
+        seed = np.zeros(x_data.size)
+        mtx = np.empty((nmtx, nmtx), dtype=b_data.dtype)
+        scope_out, scope_in = system._get_matvec_scope()
+
+        # Assemble the Jacobian by running the identity matrix through apply_linear
+        for i, seed in enumerate(identity_column_iter(seed)):
+            # set value of x vector to provided value
+            xvec.set_val(seed)
+
+            # apply linear
+            system._apply_linear(self._assembled_jac, self._rel_systems, 'fwd',
+                                 scope_out, scope_in)
+
+            # put new value in out_vec
+            mtx[:, i] = bvec.asarray()
+
+        # Restore the backed-up vectors
+        bvec.set_val(b_data)
+        xvec.set_val(x_data)
+
+        return mtx
+
+    def _linearize(self):
+        """
+        Perform factorization.
+        """
+        time0 = time.time()
+
+        system = self._system()
+        nproc = system.comm.size
+
+        if self._assembled_jac is not None:
+            matrix = self._assembled_jac._int_mtx._matrix
+
+            if matrix is None:
+                # this happens if we're not rank 0 when using owned_sizes
+                self._lu = self._lup = None
+
+            # Perform dense or sparse lu factorization.
+            elif isinstance(matrix, csc_matrix):
+                try:
+                    self._lu = scipy.sparse.linalg.splu(matrix)
+                except RuntimeError as err:
+                    raise RuntimeError(format_singular_error(system, matrix))
+
+            elif isinstance(matrix, np.ndarray):  # dense
+                # During LU decomposition, detect singularities and warn user.
+                with warnings.catch_warnings():
+                    if self.options['err_on_singular']:
+                        warnings.simplefilter('error', RuntimeWarning)
+                    try:
+                        self._lup = scipy.linalg.lu_factor(matrix)
+                    except RuntimeWarning as err:
+                        raise RuntimeError(format_singular_error(system, matrix))
+
+                    # NaN in matrix.
+                    except ValueError as err:
+                        raise RuntimeError(format_nan_error(system, matrix))
+
+            # Note: calling scipy.sparse.linalg.splu on a COO actually transposes
+            # the matrix during conversion to csc prior to LU decomp, so we can't use COO.
+            else:
+                raise RuntimeError("Direct solver not implemented for matrix type %s"
+                                   " in %s." % (type(self._assembled_jac._int_mtx),
+                                                system.msginfo))
+        else:
+            if nproc > 1:
+                raise RuntimeError("DirectSolvers without an assembled jacobian are not supported "
+                                   "when running under MPI if comm.size > 1.")
+
+            mtx = self._build_mtx()
+
+            # During LU decomposition, detect singularities and warn user.
+            with warnings.catch_warnings():
+
+                if self.options['err_on_singular']:
+                    warnings.simplefilter('error', RuntimeWarning)
+
+                try:
+                    self._lup = scipy.linalg.lu_factor(mtx)
+
+                except RuntimeWarning as err:
+                    raise RuntimeError(format_singular_error(system, mtx))
+
+                # NaN in matrix.
+                except ValueError as err:
+                    raise RuntimeError(format_nan_error(system, mtx))
+
+        # EDIT-CACHE
+        # --- initialize cache ---
+        if self.options['use_cache']:
+            self._rhs_cache_list = []
+            self._sol_cache_list = []
+        
+    def _inverse(self):
+        """
+        Return the inverse Jacobian.
+
+        This is only used by the Broyden solver when calculating a full model Jacobian. Since it
+        is only done for a single RHS, no need for LU.
+
+        Returns
+        -------
+        ndarray
+            Inverse Jacobian.
+        """
+        system = self._system()
+        iproc = system.comm.rank
+        nproc = system.comm.size
+
+        if self._assembled_jac is not None:
+
+            matrix = self._assembled_jac._int_mtx._matrix
+
+            if matrix is None:
+                # This happens if we're not rank 0 and owned_sizes are being used
+                sz = np.sum(system._owned_sizes)
+                inv_jac = np.zeros((sz, sz))
+
+            # Dense and Sparse matrices have their own inverse method.
+            elif isinstance(matrix, np.ndarray):
+                # Detect singularities and warn user.
+                with warnings.catch_warnings():
+                    if self.options['err_on_singular']:
+                        warnings.simplefilter('error', RuntimeWarning)
+                    try:
+                        inv_jac = scipy.linalg.inv(matrix)
+                    except RuntimeWarning as err:
+                        raise RuntimeError(format_singular_error(system, matrix))
+
+                    # NaN in matrix.
+                    except ValueError as err:
+                        raise RuntimeError(format_nan_error(system, matrix))
+
+            elif isinstance(matrix, csc_matrix):
+                try:
+                    inv_jac = scipy.sparse.linalg.inv(matrix)
+                except RuntimeError as err:
+                    raise RuntimeError(format_singular_error(system, matrix))
+
+                # to prevent broadcasting errors later, make sure inv_jac is 2D
+                # scipy.sparse.linalg.inv returns a shape (1,) array if matrix is shape (1,1)
+                if inv_jac.size == 1:
+                    inv_jac = inv_jac.reshape((1, 1))
+            else:
+                raise RuntimeError("Direct solver not implemented for matrix type %s"
+                                   " in %s." % (type(matrix), system.msginfo))
+
+        else:
+            if nproc > 1:
+                raise RuntimeError("BroydenSolvers without an assembled jacobian are not supported "
+                                   "when running under MPI if comm.size > 1.")
+            mtx = self._build_mtx()
+
+            # During inversion detect singularities and warn user.
+            with warnings.catch_warnings():
+
+                if self.options['err_on_singular']:
+                    warnings.simplefilter('error', RuntimeWarning)
+
+                try:
+                    inv_jac = scipy.linalg.inv(mtx)
+
+                except RuntimeWarning as err:
+                    raise RuntimeError(format_singular_error(system, mtx))
+
+                # NaN in matrix.
+                except ValueError as err:
+                    raise RuntimeError(format_nan_error(system, mtx))
+
+        return inv_jac
+
+    def solve(self, mode, rel_systems=None):
+        """
+        Run the solver.
+
+        Parameters
+        ----------
+        mode : str
+            'fwd' or 'rev'.
+        rel_systems : set of str
+            Names of systems relevant to the current solve.
+        """
+        time0 = time.time()
+
+        system = self._system()
+
+        d_residuals = system._dresiduals
+        d_outputs = system._doutputs
+
+        # assign x and b vectors based on mode
+        if mode == 'fwd':
+            x_vec = d_outputs.asarray()
+            b_vec = d_residuals.asarray()
+            trans_lu = 0
+            trans_splu = 'N'
+        else:  # rev
+            x_vec = d_residuals.asarray()
+            b_vec = d_outputs.asarray()
+            trans_lu = 1
+            trans_splu = 'T'
+
+        # EDIT-CACHE
+        # --------------------------------------------------
+        # --- check if we can reuse the cached solutions ---
+        if self.options['use_cache']:
+            for i, rhs_cache in enumerate(self._rhs_cache_list):
+                # Check if the RHS vector is the same as a cached vector. This part is not necessary, but is less expensive than checking if two vectors are parallel.
+                if np.allclose(b_vec, rhs_cache, rtol=1e-12, atol=1e-12):
+                    x_vec[:] = self._sol_cache_list[i]
+                    print('*** use caching in DirectSolver with scaler = 1 | time = %.2E s ***' % (time.time() - time0))
+                    return
+                # Check if the RHS vector is equal to -1 * cached vector.
+                elif np.allclose(b_vec, -rhs_cache, rtol=1e-12, atol=1e-12):
+                    x_vec[:] = -self._sol_cache_list[i]
+                    print('*** use caching in DirectSolver with scaler = -1 | time = %.2E s ***' % (time.time() - time0))
+                    return
+
+                # Check if the RHS vector and a cached vector are parallel
+                # NOTE: the following parallel vector check may be inaccurate for some cases. maybe should tighten the tolerance?
+                dot_product = np.dot(b_vec, rhs_cache)
+                norm1 = np.linalg.norm(b_vec)
+                norm2 = np.linalg.norm(rhs_cache)
+                if np.isclose(abs(dot_product), norm1 * norm2, rtol=1e-100, atol=1e-15):
+                    # two vectors are parallel, thus we can use the cache.
+                    scaler = dot_product / norm2**2
+                    x_vec[:] = self._sol_cache_list[i] * scaler
+                    print('*** use caching in DirectSolver with scaler = %.2E | time = %.2E s ***' % (scaler, time.time() - time0))
+                    return
+
+        # TODO: reset cache optimization iteration
+        # --------------------------------------------------
+
+        # AssembledJacobians are unscaled.
+        if self._assembled_jac is not None:
+            full_b = tmp = b_vec
+
+            with system._unscaled_context(outputs=[d_outputs], residuals=[d_residuals]):
+                if isinstance(self._assembled_jac._int_mtx, DenseMatrix):
+                    arr = scipy.linalg.lu_solve(self._lup, full_b, trans=trans_lu)
+                else:
+                    arr = self._lu.solve(full_b, trans_splu)
+
+                x_vec[:] = arr
+
+        # matrix-vector-product generated jacobians are scaled.
+        else:
+            x_vec[:] = scipy.linalg.lu_solve(self._lup, b_vec, trans=trans_lu)
+
+        print('*** solved in DirectSolver | time = %.2E s ***' % (time.time() - time0))
+
+        # EDIT-CACHE
+        # --- append the current solution to the cache ---
+        if self.options['use_cache']:
+            self._rhs_cache_list.append(b_vec.copy())
+            self._sol_cache_list.append(x_vec.copy())
diff --git a/POEM_093/example1.py b/POEM_093/example1.py
new file mode 100644
index 00000000..b78d247b
--- /dev/null
+++ b/POEM_093/example1.py
@@ -0,0 +1,338 @@
+"""
+This is a two-point aerostructural optimization based on the following example on OAS docs:
+https://mdolab-openaerostruct.readthedocs-hosted.com/en/latest/advanced_features/multipoint.html#aerostructural-optimization-example-q400
+
+I changed the objective function and constraints from the example.
+The optimization problem is (each point is labeled 0 and 1)
+
+min  CD0 + CD1
+s.t. CL0 = 0.6
+     CL1 = CL0 + 0.2
+"""
+
+import matplotlib.pyplot as plt
+
+import numpy as np
+import pickle
+
+from openaerostruct.integration.aerostruct_groups import AerostructGeometry, AerostructPoint
+from openaerostruct.structures.wingbox_fuel_vol_delta import WingboxFuelVolDelta
+import openmdao.api as om
+
+# import custom linear solver with caching prototype
+from direct_custom import DirectSolverCustom
+from petsc_ksp_custom import PETScKrylovCustom
+
+# Provide coordinates for a portion of an airfoil for the wingbox cross-section as an nparray with dtype=complex (to work with the complex-step approximation for derivatives).
+# These should be for an airfoil with the chord scaled to 1.
+# We use the 10% to 60% portion of the NASA SC2-0612 airfoil for this case
+# We use the coordinates available from airfoiltools.com. Using such a large number of coordinates is not necessary.
+# The first and last x-coordinates of the upper and lower surfaces must be the same
+
+# fmt: off
+upper_x = np.array([0.1, 0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.2, 0.21, 0.22, 0.23, 0.24, 0.25, 0.26, 0.27, 0.28, 0.29, 0.3, 0.31, 0.32, 0.33, 0.34, 0.35, 0.36, 0.37, 0.38, 0.39, 0.4, 0.41, 0.42, 0.43, 0.44, 0.45, 0.46, 0.47, 0.48, 0.49, 0.5, 0.51, 0.52, 0.53, 0.54, 0.55, 0.56, 0.57, 0.58, 0.59, 0.6], dtype="complex128")
+lower_x = np.array([0.1, 0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.2, 0.21, 0.22, 0.23, 0.24, 0.25, 0.26, 0.27, 0.28, 0.29, 0.3, 0.31, 0.32, 0.33, 0.34, 0.35, 0.36, 0.37, 0.38, 0.39, 0.4, 0.41, 0.42, 0.43, 0.44, 0.45, 0.46, 0.47, 0.48, 0.49, 0.5, 0.51, 0.52, 0.53, 0.54, 0.55, 0.56, 0.57, 0.58, 0.59, 0.6], dtype="complex128")
+upper_y = np.array([ 0.0447,  0.046,  0.0472,  0.0484,  0.0495,  0.0505,  0.0514,  0.0523,  0.0531,  0.0538, 0.0545,  0.0551,  0.0557, 0.0563,  0.0568, 0.0573,  0.0577,  0.0581,  0.0585,  0.0588,  0.0591,  0.0593,  0.0595,  0.0597,  0.0599,  0.06,    0.0601,  0.0602,  0.0602,  0.0602,  0.0602,  0.0602,  0.0601,  0.06,    0.0599,  0.0598,  0.0596,  0.0594,  0.0592,  0.0589,  0.0586,  0.0583,  0.058,   0.0576,  0.0572,  0.0568,  0.0563,  0.0558,  0.0553,  0.0547,  0.0541], dtype="complex128")  # noqa: E201, E241
+lower_y = np.array([-0.0447, -0.046, -0.0473, -0.0485, -0.0496, -0.0506, -0.0515, -0.0524, -0.0532, -0.054, -0.0547, -0.0554, -0.056, -0.0565, -0.057, -0.0575, -0.0579, -0.0583, -0.0586, -0.0589, -0.0592, -0.0594, -0.0595, -0.0596, -0.0597, -0.0598, -0.0598, -0.0598, -0.0598, -0.0597, -0.0596, -0.0594, -0.0592, -0.0589, -0.0586, -0.0582, -0.0578, -0.0573, -0.0567, -0.0561, -0.0554, -0.0546, -0.0538, -0.0529, -0.0519, -0.0509, -0.0497, -0.0485, -0.0472, -0.0458, -0.0444], dtype="complex128")
+# fmt: on
+
+# Here we create a custom mesh for the wing
+# It is evenly spaced with nx chordwise nodal points and ny spanwise nodal points for the half-span
+
+span = 28.42  # wing span in m
+root_chord = 3.34  # root chord in m
+
+nx = 3  # number of chordwise nodal points (should be odd)
+ny = 21  # number of spanwise nodal points for the half-span
+
+# Initialize the 3-D mesh object. Chordwise, spanwise, then the 3D coordinates.
+mesh = np.zeros((nx, ny, 3))
+
+# Start away from the symmetry plane and approach the plane as the array indices increase.
+# The form of this 3-D array can be very confusing initially.
+# For each node we are providing the x, y, and z coordinates.
+# x is chordwise, y is spanwise, and z is up.
+# For example (for a mesh with 5 chordwise nodes and 15 spanwise nodes for the half wing), the node for the leading edge at the tip would be specified as mesh[0, 0, :] = np.array([1.1356, -14.21, 0.])
+# and the node at the trailing edge at the root would be mesh[4, 14, :] = np.array([3.34, 0., 0.]).
+# We only provide the left half of the wing because we use symmetry.
+# Print the following mesh and elements of the mesh to better understand the form.
+
+mesh[:, :, 1] = np.linspace(-span / 2, 0, ny)
+mesh[0, :, 0] = 0.34 * root_chord * np.linspace(1.0, 0.0, ny)
+mesh[2, :, 0] = root_chord * (np.linspace(0.4, 1.0, ny) + 0.34 * np.linspace(1.0, 0.0, ny))
+mesh[1, :, 0] = (mesh[2, :, 0] + mesh[0, :, 0]) / 2
+
+# print(mesh)
+
+surf_dict = {
+    # Wing definition
+    "name": "wing",  # name of the surface
+    "symmetry": True,  # if true, model one half of wing
+    "S_ref_type": "wetted",  # how we compute the wing area,
+    # can be 'wetted' or 'projected'
+    "mesh": mesh,
+    "twist_cp": np.array([6.0, 7.0, 7.0, 7.0]),
+    "fem_model_type": "wingbox",
+    "data_x_upper": upper_x,
+    "data_x_lower": lower_x,
+    "data_y_upper": upper_y,
+    "data_y_lower": lower_y,
+    "spar_thickness_cp": np.array([0.004, 0.004, 0.004, 0.004]),  # [m]
+    "skin_thickness_cp": np.array([0.003, 0.006, 0.010, 0.012]),  # [m]
+    "original_wingbox_airfoil_t_over_c": 0.12,
+    # Aerodynamic deltas.
+    # These CL0 and CD0 values are added to the CL and CD
+    # obtained from aerodynamic analysis of the surface to get
+    # the total CL and CD.
+    # These CL0 and CD0 values do not vary wrt alpha.
+    # They can be used to account for things that are not included, such as contributions from the fuselage, nacelles, tail surfaces, etc.
+    "CL0": 0.0,
+    "CD0": 0.0142,
+    "with_viscous": True,  # if true, compute viscous drag
+    "with_wave": True,  # if true, compute wave drag
+    # Airfoil properties for viscous drag calculation
+    "k_lam": 0.05,  # percentage of chord with laminar
+    # flow, used for viscous drag
+    "c_max_t": 0.38,  # chordwise location of maximum thickness
+    "t_over_c_cp": np.array([0.1, 0.1, 0.15, 0.15]),
+    # Structural values are based on aluminum 7075
+    "E": 73.1e9,  # [Pa] Young's modulus
+    "G": (73.1e9 / 2 / 1.33),  # [Pa] shear modulus (calculated using E and the Poisson's ratio here)
+    "yield": (420.0e6 / 1.5),  # [Pa] allowable yield stress
+    "mrho": 2.78e3,  # [kg/m^3] material density
+    "strength_factor_for_upper_skin": 1.0,  # the yield stress is multiplied by this factor for the upper skin
+    "wing_weight_ratio": 1.25,
+    "exact_failure_constraint": False,  # if false, use KS function
+    "struct_weight_relief": True,
+    "distributed_fuel_weight": True,
+    "fuel_density": 803.0,  # [kg/m^3] fuel density (only needed if the fuel-in-wing volume constraint is used)
+    "Wf_reserve": 500.0,  # [kg] reserve fuel mass
+}
+
+surfaces = [surf_dict]
+
+# Create the problem and assign the model group
+prob = om.Problem()
+
+# Add problem information as an independent variables component
+indep_var_comp = om.IndepVarComp()
+indep_var_comp.add_output("v", val=np.array([0.5 * 310.95, 0.3 * 340.294]), units="m/s")
+indep_var_comp.add_output("alpha", val=0.0, units="deg")
+indep_var_comp.add_output("alpha_maneuver", val=0.0, units="deg")
+indep_var_comp.add_output("Mach_number", val=np.array([0.5, 0.3]))
+indep_var_comp.add_output(
+    "re",
+    val=np.array([0.569 * 310.95 * 0.5 * 1.0 / (1.56 * 1e-5), 1.225 * 340.294 * 0.3 * 1.0 / (1.81206 * 1e-5)]),
+    units="1/m",
+)
+indep_var_comp.add_output("rho", val=np.array([0.569, 1.225]), units="kg/m**3")
+indep_var_comp.add_output("CT", val=0.43 / 3600, units="1/s")
+indep_var_comp.add_output("R", val=2e6, units="m")
+indep_var_comp.add_output("W0", val=25400 + surf_dict["Wf_reserve"], units="kg")
+indep_var_comp.add_output("speed_of_sound", val=np.array([310.95, 340.294]), units="m/s")
+indep_var_comp.add_output("load_factor", val=np.array([1.0, 2.5]))
+indep_var_comp.add_output("empty_cg", val=np.zeros((3)), units="m")
+indep_var_comp.add_output("fuel_mass", val=3000.0, units="kg")
+
+prob.model.add_subsystem("prob_vars", indep_var_comp, promotes=["*"])
+
+# Loop over each surface in the surfaces list
+for surface in surfaces:
+    # Get the surface name and create a group to contain components
+    # only for this surface
+    name = surface["name"]
+
+    aerostruct_group = AerostructGeometry(surface=surface)
+
+    # Add group to the problem with the name of the surface.
+    prob.model.add_subsystem(name, aerostruct_group)
+
+# Loop through and add a certain number of aerostruct points
+for i in range(2):
+    point_name = "AS_point_{}".format(i)
+    # Connect the parameters within the model for each aerostruct point
+
+    # Create the aero point group and add it to the model
+    AS_point = AerostructPoint(surfaces=surfaces, internally_connect_fuelburn=False)
+
+    prob.model.add_subsystem(point_name, AS_point)
+
+    # Connect flow properties to the analysis point
+    prob.model.connect("v", point_name + ".v", src_indices=[i])
+    prob.model.connect("Mach_number", point_name + ".Mach_number", src_indices=[i])
+    prob.model.connect("re", point_name + ".re", src_indices=[i])
+    prob.model.connect("rho", point_name + ".rho", src_indices=[i])
+    prob.model.connect("CT", point_name + ".CT")
+    prob.model.connect("R", point_name + ".R")
+    prob.model.connect("W0", point_name + ".W0")
+    prob.model.connect("speed_of_sound", point_name + ".speed_of_sound", src_indices=[i])
+    prob.model.connect("empty_cg", point_name + ".empty_cg")
+    prob.model.connect("load_factor", point_name + ".load_factor", src_indices=[i])
+    prob.model.connect("fuel_mass", point_name + ".total_perf.L_equals_W.fuelburn")
+    prob.model.connect("fuel_mass", point_name + ".total_perf.CG.fuelburn")
+
+    for surface in surfaces:
+        name = surface["name"]
+
+        if surf_dict["distributed_fuel_weight"]:
+            prob.model.connect("load_factor", point_name + ".coupled.load_factor", src_indices=[i])
+
+        com_name = point_name + "." + name + "_perf."
+        prob.model.connect(
+            name + ".local_stiff_transformed", point_name + ".coupled." + name + ".local_stiff_transformed"
+        )
+        prob.model.connect(name + ".nodes", point_name + ".coupled." + name + ".nodes")
+
+        # Connect aerodyamic mesh to coupled group mesh
+        prob.model.connect(name + ".mesh", point_name + ".coupled." + name + ".mesh")
+        if surf_dict["struct_weight_relief"]:
+            prob.model.connect(name + ".element_mass", point_name + ".coupled." + name + ".element_mass")
+
+        # Connect performance calculation variables
+        prob.model.connect(name + ".nodes", com_name + "nodes")
+        prob.model.connect(name + ".cg_location", point_name + "." + "total_perf." + name + "_cg_location")
+        prob.model.connect(name + ".structural_mass", point_name + "." + "total_perf." + name + "_structural_mass")
+
+        # Connect wingbox properties to von Mises stress calcs
+        prob.model.connect(name + ".Qz", com_name + "Qz")
+        prob.model.connect(name + ".J", com_name + "J")
+        prob.model.connect(name + ".A_enc", com_name + "A_enc")
+        prob.model.connect(name + ".htop", com_name + "htop")
+        prob.model.connect(name + ".hbottom", com_name + "hbottom")
+        prob.model.connect(name + ".hfront", com_name + "hfront")
+        prob.model.connect(name + ".hrear", com_name + "hrear")
+
+        prob.model.connect(name + ".spar_thickness", com_name + "spar_thickness")
+        prob.model.connect(name + ".t_over_c", com_name + "t_over_c")
+
+prob.model.connect("alpha", "AS_point_0" + ".alpha")
+prob.model.connect("alpha_maneuver", "AS_point_1" + ".alpha")
+
+# Here we add the fuel volume constraint componenet to the model
+prob.model.add_subsystem("fuel_vol_delta", WingboxFuelVolDelta(surface=surface))
+prob.model.connect("wing.struct_setup.fuel_vols", "fuel_vol_delta.fuel_vols")
+prob.model.connect("AS_point_0.fuelburn", "fuel_vol_delta.fuelburn")
+
+if surf_dict["distributed_fuel_weight"]:
+    prob.model.connect("wing.struct_setup.fuel_vols", "AS_point_0.coupled.wing.struct_states.fuel_vols")
+    prob.model.connect("fuel_mass", "AS_point_0.coupled.wing.struct_states.fuel_mass")
+
+    prob.model.connect("wing.struct_setup.fuel_vols", "AS_point_1.coupled.wing.struct_states.fuel_vols")
+    prob.model.connect("fuel_mass", "AS_point_1.coupled.wing.struct_states.fuel_mass")
+
+comp = om.ExecComp("fuel_diff = (fuel_mass - fuelburn) / fuelburn", units="kg")
+prob.model.add_subsystem("fuel_diff", comp, promotes_inputs=["fuel_mass"], promotes_outputs=["fuel_diff"])
+prob.model.connect("AS_point_0.fuelburn", "fuel_diff.fuelburn")
+
+
+## Use these settings if you do not have pyOptSparse or SNOPT
+prob.driver = om.ScipyOptimizeDriver()
+prob.driver = om.pyOptSparseDriver()
+prob.driver.options['optimizer'] = 'SNOPT'
+prob.driver.options['print_results'] = True
+prob.driver.opt_settings['Major iterations limit'] = 1
+
+# # The following are the optimizer settings used for the EngOpt conference paper
+# # Uncomment them if you can use SNOPT
+# prob.driver = om.pyOptSparseDriver()
+# prob.driver.options['optimizer'] = "SNOPT"
+# prob.driver.opt_settings['Major optimality tolerance'] = 5e-6
+# prob.driver.opt_settings['Major feasibility tolerance'] = 1e-8
+# prob.driver.opt_settings['Major iterations limit'] = 200
+
+recorder = om.SqliteRecorder("aerostruct.db")
+prob.driver.add_recorder(recorder)
+
+# We could also just use prob.driver.recording_options['includes']=['*'] here, but for large meshes the database file becomes extremely large. So we just select the variables we need.
+prob.driver.recording_options["includes"] = [
+    "alpha",
+    "rho",
+    "v",
+    "cg",
+    "AS_point_1.cg",
+    "AS_point_0.cg",
+    "AS_point_0.coupled.wing_loads.loads",
+    "AS_point_1.coupled.wing_loads.loads",
+    "AS_point_0.coupled.wing.normals",
+    "AS_point_1.coupled.wing.normals",
+    "AS_point_0.coupled.wing.widths",
+    "AS_point_1.coupled.wing.widths",
+    "AS_point_0.coupled.aero_states.wing_sec_forces",
+    "AS_point_1.coupled.aero_states.wing_sec_forces",
+    "AS_point_0.wing_perf.CL1",
+    "AS_point_1.wing_perf.CL1",
+    "AS_point_0.coupled.wing.S_ref",
+    "AS_point_1.coupled.wing.S_ref",
+    "wing.geometry.twist",
+    "wing.mesh",
+    "wing.skin_thickness",
+    "wing.spar_thickness",
+    "wing.t_over_c",
+    "wing.structural_mass",
+    "AS_point_0.wing_perf.vonmises",
+    "AS_point_1.wing_perf.vonmises",
+    "AS_point_0.coupled.wing.def_mesh",
+    "AS_point_1.coupled.wing.def_mesh",
+]
+
+prob.driver.recording_options["record_objectives"] = True
+prob.driver.recording_options["record_constraints"] = True
+prob.driver.recording_options["record_desvars"] = True
+prob.driver.recording_options["record_inputs"] = True
+
+# design variables
+prob.model.add_design_var("wing.twist_cp", lower=-15.0, upper=15.0, scaler=0.1)
+prob.model.add_design_var("wing.spar_thickness_cp", lower=0.003, upper=0.1, scaler=1e2)
+prob.model.add_design_var("wing.skin_thickness_cp", lower=0.003, upper=0.1, scaler=1e2)
+prob.model.add_design_var("wing.geometry.t_over_c_cp", lower=0.07, upper=0.2, scaler=10.0)
+prob.model.add_design_var("fuel_mass", lower=0.0, upper=2e5, scaler=1e-5)
+prob.model.add_design_var("alpha_maneuver", lower=-15.0, upper=15)
+
+# minimize sum of CD
+prob.model.add_subsystem('CD_sum_comp', om.AddSubtractComp('CD_sum', input_names=['CD0', 'CD1']), promotes_outputs=['CD_sum'])
+prob.model.connect('AS_point_0.CD', 'CD_sum_comp.CD0')
+prob.model.connect('AS_point_1.CD', 'CD_sum_comp.CD1')
+prob.model.add_objective('CD_sum')
+
+# --- CL constraints ---
+# CL0 = 0.6
+prob.model.add_constraint("AS_point_0.CL", equals=0.6)   # NOTE: derivatives order: <CD0, CD1>, CL1, -CL0, CL0)
+
+# CL1 - CLO = 0.2
+cl_diff_comp = om.ExecComp('CL_diff = CL1 - CL0 - 0.2', units=None)
+prob.model.add_subsystem('CL_diff_comp', cl_diff_comp, promotes_outputs=['CL_diff'])
+prob.model.connect('AS_point_0.CL', 'CL_diff_comp.CL0')
+prob.model.connect('AS_point_1.CL', 'CL_diff_comp.CL1')
+prob.model.add_constraint("CL_diff", equals=0.0)   # instead of CL1 = 0.8, impose CL1 = CL0 + 0.2
+
+# Set up the problem
+prob.setup(mode='rev', check=True)
+
+# ==============================
+#  Set linear solvers here
+# ==============================
+# --- Use Krylov solver for aerostructural coupled adjoint ---
+# prob.model.AS_point_0.coupled.linear_solver = om.PETScKrylov(iprint=2, assemble_jac=True)
+# prob.model.AS_point_0.coupled.linear_solver.precon = om.LinearRunOnce(iprint=-1)
+# prob.model.AS_point_1.coupled.linear_solver = om.PETScKrylov(iprint=2, assemble_jac=True)
+# prob.model.AS_point_1.coupled.linear_solver.precon = om.LinearRunOnce(iprint=-1)
+
+# --- Use Custom Krylov solver with solution caching ---
+prob.model.AS_point_0.coupled.linear_solver = PETScKrylovCustom(iprint=2, assemble_jac=True, use_cache=True)
+prob.model.AS_point_0.coupled.linear_solver.precon = om.LinearRunOnce(iprint=-1)
+prob.model.AS_point_1.coupled.linear_solver = PETScKrylovCustom(iprint=2, assemble_jac=True, use_cache=True)
+prob.model.AS_point_1.coupled.linear_solver.precon = om.LinearRunOnce(iprint=-1)
+
+# --- Use Custom Direct solver with solution caching ---
+# prob.model.AS_point_0.coupled.linear_solver = DirectSolverCustom(use_cache=True)
+# prob.model.AS_point_0.coupled.linear_solver.precon = om.LinearRunOnce(iprint=-1)
+# prob.model.AS_point_1.coupled.linear_solver = DirectSolverCustom(use_cache=True)
+# prob.model.AS_point_1.coupled.linear_solver.precon = om.LinearRunOnce(iprint=-1)
+
+prob.run_model()
+
+print("\n --- compute totals --- \n")
+totals = prob.compute_totals()
+print('Derivatives of CL_diff w.r.t. twist_cp =', totals[('CL_diff_comp.CL_diff', 'wing.twist_cp')])
+
+# om.n2(prob)
\ No newline at end of file
diff --git a/POEM_093/example2.py b/POEM_093/example2.py
new file mode 100644
index 00000000..c8bddd83
--- /dev/null
+++ b/POEM_093/example2.py
@@ -0,0 +1,312 @@
+"""
+This is a two-point aerostructural optimization based on the following example on OAS docs:
+https://mdolab-openaerostruct.readthedocs-hosted.com/en/latest/advanced_features/multipoint.html#aerostructural-optimization-example-q400
+
+I changed the objective function and constraints from the example.
+The optimization problem is (each point is labeled 0 and 1)
+
+min  fuel burn
+s.t. L=W at point 0 (cruise)
+     L=W at point 1 (maneuver)
+"""
+
+import matplotlib.pyplot as plt
+
+import numpy as np
+import pickle
+
+from openaerostruct.integration.aerostruct_groups import AerostructGeometry, AerostructPoint
+from openaerostruct.structures.wingbox_fuel_vol_delta import WingboxFuelVolDelta
+import openmdao.api as om
+
+# import custom linear solver with caching prototype
+from direct_custom import DirectSolverCustom
+from petsc_ksp_custom import PETScKrylovCustom
+
+# Provide coordinates for a portion of an airfoil for the wingbox cross-section as an nparray with dtype=complex (to work with the complex-step approximation for derivatives).
+# These should be for an airfoil with the chord scaled to 1.
+# We use the 10% to 60% portion of the NASA SC2-0612 airfoil for this case
+# We use the coordinates available from airfoiltools.com. Using such a large number of coordinates is not necessary.
+# The first and last x-coordinates of the upper and lower surfaces must be the same
+
+# fmt: off
+upper_x = np.array([0.1, 0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.2, 0.21, 0.22, 0.23, 0.24, 0.25, 0.26, 0.27, 0.28, 0.29, 0.3, 0.31, 0.32, 0.33, 0.34, 0.35, 0.36, 0.37, 0.38, 0.39, 0.4, 0.41, 0.42, 0.43, 0.44, 0.45, 0.46, 0.47, 0.48, 0.49, 0.5, 0.51, 0.52, 0.53, 0.54, 0.55, 0.56, 0.57, 0.58, 0.59, 0.6], dtype="complex128")
+lower_x = np.array([0.1, 0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.2, 0.21, 0.22, 0.23, 0.24, 0.25, 0.26, 0.27, 0.28, 0.29, 0.3, 0.31, 0.32, 0.33, 0.34, 0.35, 0.36, 0.37, 0.38, 0.39, 0.4, 0.41, 0.42, 0.43, 0.44, 0.45, 0.46, 0.47, 0.48, 0.49, 0.5, 0.51, 0.52, 0.53, 0.54, 0.55, 0.56, 0.57, 0.58, 0.59, 0.6], dtype="complex128")
+upper_y = np.array([ 0.0447,  0.046,  0.0472,  0.0484,  0.0495,  0.0505,  0.0514,  0.0523,  0.0531,  0.0538, 0.0545,  0.0551,  0.0557, 0.0563,  0.0568, 0.0573,  0.0577,  0.0581,  0.0585,  0.0588,  0.0591,  0.0593,  0.0595,  0.0597,  0.0599,  0.06,    0.0601,  0.0602,  0.0602,  0.0602,  0.0602,  0.0602,  0.0601,  0.06,    0.0599,  0.0598,  0.0596,  0.0594,  0.0592,  0.0589,  0.0586,  0.0583,  0.058,   0.0576,  0.0572,  0.0568,  0.0563,  0.0558,  0.0553,  0.0547,  0.0541], dtype="complex128")  # noqa: E201, E241
+lower_y = np.array([-0.0447, -0.046, -0.0473, -0.0485, -0.0496, -0.0506, -0.0515, -0.0524, -0.0532, -0.054, -0.0547, -0.0554, -0.056, -0.0565, -0.057, -0.0575, -0.0579, -0.0583, -0.0586, -0.0589, -0.0592, -0.0594, -0.0595, -0.0596, -0.0597, -0.0598, -0.0598, -0.0598, -0.0598, -0.0597, -0.0596, -0.0594, -0.0592, -0.0589, -0.0586, -0.0582, -0.0578, -0.0573, -0.0567, -0.0561, -0.0554, -0.0546, -0.0538, -0.0529, -0.0519, -0.0509, -0.0497, -0.0485, -0.0472, -0.0458, -0.0444], dtype="complex128")
+# fmt: on
+
+# Here we create a custom mesh for the wing
+# It is evenly spaced with nx chordwise nodal points and ny spanwise nodal points for the half-span
+
+span = 28.42  # wing span in m
+root_chord = 3.34  # root chord in m
+
+nx = 3  # number of chordwise nodal points (should be odd)
+ny = 21  # number of spanwise nodal points for the half-span
+
+# Initialize the 3-D mesh object. Chordwise, spanwise, then the 3D coordinates.
+mesh = np.zeros((nx, ny, 3))
+
+# Start away from the symmetry plane and approach the plane as the array indices increase.
+# The form of this 3-D array can be very confusing initially.
+# For each node we are providing the x, y, and z coordinates.
+# x is chordwise, y is spanwise, and z is up.
+# For example (for a mesh with 5 chordwise nodes and 15 spanwise nodes for the half wing), the node for the leading edge at the tip would be specified as mesh[0, 0, :] = np.array([1.1356, -14.21, 0.])
+# and the node at the trailing edge at the root would be mesh[4, 14, :] = np.array([3.34, 0., 0.]).
+# We only provide the left half of the wing because we use symmetry.
+# Print the following mesh and elements of the mesh to better understand the form.
+
+mesh[:, :, 1] = np.linspace(-span / 2, 0, ny)
+mesh[0, :, 0] = 0.34 * root_chord * np.linspace(1.0, 0.0, ny)
+mesh[2, :, 0] = root_chord * (np.linspace(0.4, 1.0, ny) + 0.34 * np.linspace(1.0, 0.0, ny))
+mesh[1, :, 0] = (mesh[2, :, 0] + mesh[0, :, 0]) / 2
+
+# print(mesh)
+
+surf_dict = {
+    # Wing definition
+    "name": "wing",  # name of the surface
+    "symmetry": True,  # if true, model one half of wing
+    "S_ref_type": "wetted",  # how we compute the wing area,
+    # can be 'wetted' or 'projected'
+    "mesh": mesh,
+    "twist_cp": np.array([6.0, 7.0, 7.0, 7.0]),
+    "fem_model_type": "wingbox",
+    "data_x_upper": upper_x,
+    "data_x_lower": lower_x,
+    "data_y_upper": upper_y,
+    "data_y_lower": lower_y,
+    "spar_thickness_cp": np.array([0.004, 0.004, 0.004, 0.004]),  # [m]
+    "skin_thickness_cp": np.array([0.003, 0.006, 0.010, 0.012]),  # [m]
+    "original_wingbox_airfoil_t_over_c": 0.12,
+    # Aerodynamic deltas.
+    # These CL0 and CD0 values are added to the CL and CD
+    # obtained from aerodynamic analysis of the surface to get
+    # the total CL and CD.
+    # These CL0 and CD0 values do not vary wrt alpha.
+    # They can be used to account for things that are not included, such as contributions from the fuselage, nacelles, tail surfaces, etc.
+    "CL0": 0.0,
+    "CD0": 0.0142,
+    "with_viscous": True,  # if true, compute viscous drag
+    "with_wave": True,  # if true, compute wave drag
+    # Airfoil properties for viscous drag calculation
+    "k_lam": 0.05,  # percentage of chord with laminar
+    # flow, used for viscous drag
+    "c_max_t": 0.38,  # chordwise location of maximum thickness
+    "t_over_c_cp": np.array([0.1, 0.1, 0.15, 0.15]),
+    # Structural values are based on aluminum 7075
+    "E": 73.1e9,  # [Pa] Young's modulus
+    "G": (73.1e9 / 2 / 1.33),  # [Pa] shear modulus (calculated using E and the Poisson's ratio here)
+    "yield": (420.0e6 / 1.5),  # [Pa] allowable yield stress
+    "mrho": 2.78e3,  # [kg/m^3] material density
+    "strength_factor_for_upper_skin": 1.0,  # the yield stress is multiplied by this factor for the upper skin
+    "wing_weight_ratio": 1.25,
+    "exact_failure_constraint": False,  # if false, use KS function
+    "struct_weight_relief": True,
+    "distributed_fuel_weight": False,
+    "fuel_density": 803.0,  # [kg/m^3] fuel density (only needed if the fuel-in-wing volume constraint is used)
+    "Wf_reserve": 500.0,  # [kg] reserve fuel mass
+}
+
+surfaces = [surf_dict]
+
+# Create the problem and assign the model group
+prob = om.Problem()
+
+# Add problem information as an independent variables component
+indep_var_comp = om.IndepVarComp()
+indep_var_comp.add_output("v", val=np.array([0.5 * 310.95, 0.3 * 340.294]), units="m/s")
+indep_var_comp.add_output("alpha", val=0.0, units="deg")
+indep_var_comp.add_output("alpha_maneuver", val=0.0, units="deg")
+indep_var_comp.add_output("Mach_number", val=np.array([0.5, 0.3]))
+indep_var_comp.add_output(
+    "re",
+    val=np.array([0.569 * 310.95 * 0.5 * 1.0 / (1.56 * 1e-5), 1.225 * 340.294 * 0.3 * 1.0 / (1.81206 * 1e-5)]),
+    units="1/m",
+)
+indep_var_comp.add_output("rho", val=np.array([0.569, 1.225]), units="kg/m**3")
+indep_var_comp.add_output("CT", val=0.43 / 3600, units="1/s")
+indep_var_comp.add_output("R", val=2e6, units="m")
+indep_var_comp.add_output("W0", val=25400 + surf_dict["Wf_reserve"], units="kg")
+indep_var_comp.add_output("speed_of_sound", val=np.array([310.95, 340.294]), units="m/s")
+indep_var_comp.add_output("load_factor", val=np.array([1.0, 2.5]))
+indep_var_comp.add_output("empty_cg", val=np.zeros((3)), units="m")
+
+prob.model.add_subsystem("prob_vars", indep_var_comp, promotes=["*"])
+
+# Loop over each surface in the surfaces list
+for surface in surfaces:
+    # Get the surface name and create a group to contain components
+    # only for this surface
+    name = surface["name"]
+
+    aerostruct_group = AerostructGeometry(surface=surface)
+
+    # Add group to the problem with the name of the surface.
+    prob.model.add_subsystem(name, aerostruct_group)
+
+# Loop through and add a certain number of aerostruct points
+for i in range(2):
+    point_name = "AS_point_{}".format(i)
+    # Connect the parameters within the model for each aerostruct point
+
+    # Create the aero point group and add it to the model
+    AS_point = AerostructPoint(surfaces=surfaces, internally_connect_fuelburn=False)
+
+    prob.model.add_subsystem(point_name, AS_point)
+
+    # Connect flow properties to the analysis point
+    prob.model.connect("v", point_name + ".v", src_indices=[i])
+    prob.model.connect("Mach_number", point_name + ".Mach_number", src_indices=[i])
+    prob.model.connect("re", point_name + ".re", src_indices=[i])
+    prob.model.connect("rho", point_name + ".rho", src_indices=[i])
+    prob.model.connect("CT", point_name + ".CT")
+    prob.model.connect("R", point_name + ".R")
+    prob.model.connect("W0", point_name + ".W0")
+    prob.model.connect("speed_of_sound", point_name + ".speed_of_sound", src_indices=[i])
+    prob.model.connect("empty_cg", point_name + ".empty_cg")
+    prob.model.connect("load_factor", point_name + ".load_factor", src_indices=[i])
+
+    # connect cruise fuelburn for L=W constraints
+    prob.model.connect('AS_point_0.fuelburn', point_name + '.total_perf.L_equals_W.fuelburn')
+    prob.model.connect('AS_point_0.fuelburn', point_name + '.total_perf.CG.fuelburn')
+    
+    for surface in surfaces:
+        name = surface["name"]
+
+        if surf_dict["distributed_fuel_weight"]:
+            prob.model.connect("load_factor", point_name + ".coupled.load_factor", src_indices=[i])
+
+        com_name = point_name + "." + name + "_perf."
+        prob.model.connect(
+            name + ".local_stiff_transformed", point_name + ".coupled." + name + ".local_stiff_transformed"
+        )
+        prob.model.connect(name + ".nodes", point_name + ".coupled." + name + ".nodes")
+
+        # Connect aerodyamic mesh to coupled group mesh
+        prob.model.connect(name + ".mesh", point_name + ".coupled." + name + ".mesh")
+        if surf_dict["struct_weight_relief"]:
+            prob.model.connect(name + ".element_mass", point_name + ".coupled." + name + ".element_mass")
+
+        # Connect performance calculation variables
+        prob.model.connect(name + ".nodes", com_name + "nodes")
+        prob.model.connect(name + ".cg_location", point_name + "." + "total_perf." + name + "_cg_location")
+        prob.model.connect(name + ".structural_mass", point_name + "." + "total_perf." + name + "_structural_mass")
+
+        # Connect wingbox properties to von Mises stress calcs
+        prob.model.connect(name + ".Qz", com_name + "Qz")
+        prob.model.connect(name + ".J", com_name + "J")
+        prob.model.connect(name + ".A_enc", com_name + "A_enc")
+        prob.model.connect(name + ".htop", com_name + "htop")
+        prob.model.connect(name + ".hbottom", com_name + "hbottom")
+        prob.model.connect(name + ".hfront", com_name + "hfront")
+        prob.model.connect(name + ".hrear", com_name + "hrear")
+
+        prob.model.connect(name + ".spar_thickness", com_name + "spar_thickness")
+        prob.model.connect(name + ".t_over_c", com_name + "t_over_c")
+
+prob.model.connect("alpha", "AS_point_0" + ".alpha")
+prob.model.connect("alpha_maneuver", "AS_point_1" + ".alpha")
+
+## Use these settings if you do not have pyOptSparse or SNOPT
+prob.driver = om.ScipyOptimizeDriver()
+prob.driver = om.pyOptSparseDriver()
+prob.driver.options['optimizer'] = 'SNOPT'
+prob.driver.options['print_results'] = True
+prob.driver.opt_settings['Major iterations limit'] = 1
+
+# # The following are the optimizer settings used for the EngOpt conference paper
+# # Uncomment them if you can use SNOPT
+# prob.driver = om.pyOptSparseDriver()
+# prob.driver.options['optimizer'] = "SNOPT"
+# prob.driver.opt_settings['Major optimality tolerance'] = 5e-6
+# prob.driver.opt_settings['Major feasibility tolerance'] = 1e-8
+# prob.driver.opt_settings['Major iterations limit'] = 200
+
+recorder = om.SqliteRecorder("aerostruct.db")
+prob.driver.add_recorder(recorder)
+
+# We could also just use prob.driver.recording_options['includes']=['*'] here, but for large meshes the database file becomes extremely large. So we just select the variables we need.
+prob.driver.recording_options["includes"] = [
+    "alpha",
+    "rho",
+    "v",
+    "cg",
+    "AS_point_1.cg",
+    "AS_point_0.cg",
+    "AS_point_0.coupled.wing_loads.loads",
+    "AS_point_1.coupled.wing_loads.loads",
+    "AS_point_0.coupled.wing.normals",
+    "AS_point_1.coupled.wing.normals",
+    "AS_point_0.coupled.wing.widths",
+    "AS_point_1.coupled.wing.widths",
+    "AS_point_0.coupled.aero_states.wing_sec_forces",
+    "AS_point_1.coupled.aero_states.wing_sec_forces",
+    "AS_point_0.wing_perf.CL1",
+    "AS_point_1.wing_perf.CL1",
+    "AS_point_0.coupled.wing.S_ref",
+    "AS_point_1.coupled.wing.S_ref",
+    "wing.geometry.twist",
+    "wing.mesh",
+    "wing.skin_thickness",
+    "wing.spar_thickness",
+    "wing.t_over_c",
+    "wing.structural_mass",
+    "AS_point_0.wing_perf.vonmises",
+    "AS_point_1.wing_perf.vonmises",
+    "AS_point_0.coupled.wing.def_mesh",
+    "AS_point_1.coupled.wing.def_mesh",
+]
+
+prob.driver.recording_options["record_objectives"] = True
+prob.driver.recording_options["record_constraints"] = True
+prob.driver.recording_options["record_desvars"] = True
+prob.driver.recording_options["record_inputs"] = True
+
+# design variables
+prob.model.add_design_var("wing.twist_cp", lower=-15.0, upper=15.0, scaler=0.1)
+prob.model.add_design_var("wing.spar_thickness_cp", lower=0.003, upper=0.1, scaler=1e2)
+prob.model.add_design_var("wing.skin_thickness_cp", lower=0.003, upper=0.1, scaler=1e2)
+prob.model.add_design_var("wing.geometry.t_over_c_cp", lower=0.07, upper=0.2, scaler=10.0)
+prob.model.add_design_var("alpha_maneuver", lower=-15.0, upper=15)
+
+# minimize fuel burn
+prob.model.add_objective("AS_point_0.fuelburn", ref=10000, units='kg')
+
+# L = W constraint for each point
+# cruise point
+prob.model.add_constraint("AS_point_0.L_equals_W", equals=0.)
+# maneuver point
+prob.model.add_constraint("AS_point_1.L_equals_W", equals=0.)
+
+# Set up the problem
+prob.setup(mode='rev', check=True)
+
+# ==============================
+#  Set linear solvers here
+# ==============================
+# --- Use Krylov solver for aerostructural coupled adjoint ---
+# prob.model.AS_point_0.coupled.linear_solver = om.PETScKrylov(iprint=2, assemble_jac=True)
+# prob.model.AS_point_0.coupled.linear_solver.precon = om.LinearRunOnce(iprint=-1)
+# prob.model.AS_point_1.coupled.linear_solver = om.PETScKrylov(iprint=2, assemble_jac=True)
+# prob.model.AS_point_1.coupled.linear_solver.precon = om.LinearRunOnce(iprint=-1)
+
+# --- Use Custom Krylov solver with solution caching ---
+prob.model.AS_point_0.coupled.linear_solver = PETScKrylovCustom(iprint=2, assemble_jac=True, use_cache=True)
+prob.model.AS_point_0.coupled.linear_solver.precon = om.LinearRunOnce(iprint=-1)
+prob.model.AS_point_1.coupled.linear_solver = PETScKrylovCustom(iprint=2, assemble_jac=True, use_cache=True)
+prob.model.AS_point_1.coupled.linear_solver.precon = om.LinearRunOnce(iprint=-1)
+
+# --- Use Custom Direct solver with solution caching ---
+# prob.model.AS_point_0.coupled.linear_solver = DirectSolverCustom(use_cache=True)
+# prob.model.AS_point_0.coupled.linear_solver.precon = om.LinearRunOnce(iprint=-1)
+# prob.model.AS_point_1.coupled.linear_solver = DirectSolverCustom(use_cache=True)
+# prob.model.AS_point_1.coupled.linear_solver.precon = om.LinearRunOnce(iprint=-1)
+
+prob.run_model()
+
+print("\n --- compute totals --- \n")
+totals = prob.compute_totals()
+
+# om.n2(prob)
\ No newline at end of file
diff --git a/POEM_093/petsc_ksp_custom.py b/POEM_093/petsc_ksp_custom.py
new file mode 100644
index 00000000..af99ca3d
--- /dev/null
+++ b/POEM_093/petsc_ksp_custom.py
@@ -0,0 +1,524 @@
+"""LinearSolver that uses PetSC KSP to solve for a system's derivatives.
+Modified to use solution caching.
+The modifications can be found by searching for "EDIT-CACHE" comments in this file.
+"""
+
+import numpy as np
+import os
+import sys
+import time
+
+from openmdao.solvers.solver import LinearSolver
+from openmdao.utils.mpi import check_mpi_env
+
+use_mpi = check_mpi_env()
+if use_mpi is not False:
+    try:
+        import petsc4py
+        from petsc4py import PETSc
+    except ImportError:
+        PETSc = None
+        if use_mpi is True:
+            raise ImportError("Importing petsc4py failed and OPENMDAO_USE_MPI is true.")
+else:
+    PETSc = None
+
+KSP_TYPES = [
+    "richardson",
+    "chebyshev",
+    "cg",
+    "groppcg",
+    "pipecg",
+    "pipecgrr",
+    "cgne",
+    "nash",
+    "stcg",
+    "gltr",
+    "fcg",
+    "pipefcg",
+    "gmres",
+    "pipefgmres",
+    "fgmres",
+    "lgmres",
+    "dgmres",
+    "pgmres",
+    "tcqmr",
+    "bcgs",
+    "ibcgs",
+    "fbcgs",
+    "fbcgsr",
+    "bcgsl",
+    "cgs",
+    "tfqmr",
+    "cr",
+    "pipecr",
+    "lsqr",
+    "preonly",
+    "qcg",
+    "bicg",
+    "minres",
+    "symmlq",
+    "lcd",
+    "python",
+    "gcr",
+    "pipegcr",
+    "tsirm",
+    "cgls"
+]
+
+
+def _get_petsc_vec_array_new(vec):
+    """
+    Get the array of values for the given PETSc vector.
+
+    Helper function to handle a petsc backwards incompatibility.
+
+    Parameters
+    ----------
+    vec : petsc vector
+        Vector whose data is being requested.
+
+    Returns
+    -------
+    ndarray
+        A readonly copy of the array of values from vec.
+    """
+    return vec.getArray(readonly=True)
+
+
+def _get_petsc_vec_array_old(vec):
+    """
+    Get the array of values for the given PETSc vector.
+
+    Helper function to handle a petsc backwards incompatibility.
+
+    Parameters
+    ----------
+    vec : petsc vector
+        Vector whose data is being requested.
+
+    Returns
+    -------
+    ndarray
+        An array of values from vec.
+    """
+    return vec.getArray()
+
+
+if PETSc:
+    try:
+        petsc_version = petsc4py.__version__
+    except AttributeError:  # hack to fix doc-tests
+        petsc_version = "3.5"
+
+
+if PETSc and int((petsc_version).split('.')[1]) >= 6:
+    _get_petsc_vec_array = _get_petsc_vec_array_new
+else:
+    _get_petsc_vec_array = _get_petsc_vec_array_old
+
+
+class Monitor(object):
+    """
+    Prints output from PETSc's KSP solvers.
+
+    Callable object given to KSP as a callback for printing the residual.
+
+    Parameters
+    ----------
+    solver : object
+        The openmdao solver.
+
+    Attributes
+    ----------
+    _solver : _solver
+        The openmdao solver.
+    _norm : float
+        The current norm.
+    _norm0 : float
+        The norm for the first iteration.
+    """
+
+    def __init__(self, solver):
+        """
+        Store pointer to the openmdao solver and initialize norms.
+        """
+        self._solver = solver
+        self._norm = 1.0
+        self._norm0 = 1.0
+
+    def __call__(self, ksp, counter, norm):
+        """
+        Store norm if first iteration, and print norm.
+
+        Parameters
+        ----------
+        ksp : object
+            the KSP solver.
+        counter : int
+            the counter.
+        norm : float
+            the norm.
+        """
+        if counter == 0 and norm != 0.0:
+            self._norm0 = norm
+        self._norm = norm
+
+        self._solver._mpi_print(counter, norm, norm / self._norm0)
+        self._solver._iter_count += 1
+
+
+class PETScKrylovCustom(LinearSolver):
+    """
+    LinearSolver that uses PetSC KSP to solve for a system's derivatives.
+
+    Parameters
+    ----------
+    **kwargs : dict
+        Dictionary of options set by the instantiating class/script.
+
+    Attributes
+    ----------
+    precon : Solver
+        Preconditioner for linear solve. Default is None for no preconditioner.
+    _ksp : dist
+        Dictionary of KSP instances (keyed on vector name).
+    """
+
+    SOLVER = 'LN: PETScKrylov'
+
+    def __init__(self, **kwargs):
+        """
+        Declare the solver options.
+        """
+        super().__init__(**kwargs)
+
+        if PETSc is None:
+            raise RuntimeError(f"{self.msginfo}: PETSc is not available. "
+                               "Set shell variable OPENMDAO_USE_MPI=1 to detect earlier.")
+
+        # initialize dictionary of KSP instances (keyed on vector name)
+        self._ksp = None
+
+        # initialize preconditioner to None
+        self.precon = None
+
+    def _declare_options(self):
+        """
+        Declare options before kwargs are processed in the init method.
+        """
+        super()._declare_options()
+
+        self.options.declare('ksp_type', default='fgmres', values=KSP_TYPES,
+                             desc="KSP algorithm to use. Default is 'fgmres'.")
+
+        self.options.declare('restart', default=1000, types=int,
+                             desc='Number of iterations between restarts. Larger values increase '
+                             'iteration cost, but may be necessary for convergence')
+
+        self.options.declare('precon_side', default='right', values=['left', 'right'],
+                             desc='Preconditioner side, default is right.')
+
+        # EDIT-CACHE
+        # --- option for caching. Default: no cache ---
+        self.options.declare('use_cache', types=bool, default=False)
+
+        # changing the default maxiter from the base class
+        self.options['maxiter'] = 100
+
+    def _assembled_jac_solver_iter(self):
+        """
+        Return a generator of linear solvers using assembled jacs.
+        """
+        if self.options['assemble_jac']:
+            yield self
+        if self.precon is not None:
+            for s in self.precon._assembled_jac_solver_iter():
+                yield s
+
+    def _setup_solvers(self, system, depth):
+        """
+        Assign system instance, set depth, and optionally perform setup.
+
+        Parameters
+        ----------
+        system : <System>
+            pointer to the owning system.
+        depth : int
+            depth of the current system (already incremented).
+        """
+        super()._setup_solvers(system, depth)
+
+        if self.precon is not None:
+            self.precon._setup_solvers(self._system(), self._depth + 1)
+
+    def _set_solver_print(self, level=2, type_='all'):
+        """
+        Control printing for solvers and subsolvers in the model.
+
+        Parameters
+        ----------
+        level : int
+            iprint level. Set to 2 to print residuals each iteration; set to 1
+            to print just the iteration totals; set to 0 to disable all printing
+            except for failures, and set to -1 to disable all printing including failures.
+        type_ : str
+            Type of solver to set: 'LN' for linear, 'NL' for nonlinear, or 'all' for all.
+        """
+        super()._set_solver_print(level=level, type_=type_)
+
+        if self.precon is not None and type_ != 'NL':
+            self.precon._set_solver_print(level=level, type_=type_)
+
+    def mult(self, mat, in_vec, result):
+        """
+        Apply Jacobian matrix (KSP Callback).
+
+        The following attributes must be defined when solve is called to
+        provide information used in this callback:
+
+        _system : System
+            pointer to the owning system.
+        _vec_name : str
+            the right-hand-side (RHS) vector name.
+        _mode : str
+            'fwd' or 'rev'.
+
+        Parameters
+        ----------
+        mat : PETSc.Mat
+            PETSc matrix object.
+        in_vec : PetSC Vector
+            Incoming vector.
+        result : PetSC Vector
+            Empty array into which we place the matrix-vector product.
+        """
+        # assign x and b vectors based on mode
+        system = self._system()
+
+        if self._mode == 'fwd':
+            x_vec = system._doutputs
+            b_vec = system._dresiduals
+        else:  # rev
+            x_vec = system._dresiduals
+            b_vec = system._doutputs
+
+        # set value of x vector to KSP provided value
+        x_vec.set_val(_get_petsc_vec_array(in_vec))
+
+        # apply linear
+        scope_out, scope_in = system._get_matvec_scope()
+        system._apply_linear(self._assembled_jac, self._rel_systems, self._mode,
+                             scope_out, scope_in)
+
+        # stuff resulting value of b vector into result for KSP
+        result.array[:] = b_vec.asarray()
+
+    def _linearize_children(self):
+        """
+        Return a flag that is True when we need to call linearize on our subsystems' solvers.
+
+        Returns
+        -------
+        bool
+            Flag for indicating child linerization
+        """
+        precon = self.precon
+        return (precon is not None) and (precon._linearize_children())
+
+    def _linearize(self):
+        """
+        Perform any required linearization operations such as matrix factorization.
+        """
+        if self.precon is not None:
+            self.precon._linearize()
+
+        # EDIT-CACHE
+        # --- initialize cache ---
+        if self.options['use_cache']:
+            self._rhs_cache_list = []
+            self._sol_cache_list = []
+
+    def solve(self, mode, rel_systems=None):
+        """
+        Solve the linear system for the problem in self._system().
+
+        The full solution vector is returned.
+
+        Parameters
+        ----------
+        mode : str
+            Derivative mode, can be 'fwd' or 'rev'.
+        rel_systems : set of str
+            Names of systems relevant to the current solve.
+        """
+        time0 = time.time()
+
+        self._rel_systems = rel_systems
+        self._mode = mode
+
+        system = self._system()
+        options = self.options
+
+        if system.under_complex_step:
+            raise RuntimeError('{}: PETScKrylov solver is not supported under '
+                               'complex step.'.format(self.msginfo))
+
+        maxiter = options['maxiter']
+        atol = options['atol']
+        rtol = options['rtol']
+
+        # assign x and b vectors based on mode
+        if self._mode == 'fwd':
+            x_vec = system._doutputs
+            b_vec = system._dresiduals
+        else:  # rev
+            x_vec = system._dresiduals
+            b_vec = system._doutputs
+
+        # create numpy arrays to interface with PETSc
+        sol_array = x_vec.asarray(copy=True)
+        rhs_array = b_vec.asarray(copy=True)
+
+        # EDIT-CACHE
+        # --------------------------------------------------
+        # --- check if we can reuse the cached solutions ---
+        if self.options['use_cache']:
+            for i, rhs_cache in enumerate(self._rhs_cache_list):
+                # Check if the RHS vector is the same as a cached vector. This part is not necessary, but is less expensive than checking if two vectors are parallel.
+                if np.allclose(rhs_array, rhs_cache, rtol=1e-12, atol=1e-12):
+                    x_vec.set_val(self._sol_cache_list[i])
+                    print('*** use caching in Krylov with scaler = 1 | time = %.2E s ***' % (time.time() - time0))
+                    return
+                # Check if the RHS vector is equal to -1 * cached vector.
+                elif np.allclose(rhs_array, -rhs_cache, rtol=1e-12, atol=1e-12):
+                    x_vec.set_val(-self._sol_cache_list[i])
+                    print('*** use caching in Krylov with scaler = -1 | time = %.2E s ***' % (time.time() - time0))
+                    return
+                
+                # Check if the RHS vector and a cached vector are parallel
+                # NOTE: the following parallel vector check may be inaccurate for some cases. maybe should tighten the tolerance?
+                dot_product = np.dot(rhs_array, rhs_cache)
+                norm1 = np.linalg.norm(rhs_array)
+                norm2 = np.linalg.norm(rhs_cache)
+                if np.isclose(abs(dot_product), norm1 * norm2, rtol=1e-12, atol=1e-12):
+                    # two vectors are parallel, thus we can use the cache.
+                    scaler = dot_product / norm2**2
+                    x_vec.set_val(self._sol_cache_list[i] * scaler)
+                    print('*** use caching in Krylov with scaler = %.2E | time = %.2E s ***' % (scaler, time.time() - time0))
+                    return
+        
+        # TODO: reset cache optimization iteration
+        # --------------------------------------------------
+
+        # create PETSc vectors from numpy arrays
+        sol_petsc_vec = PETSc.Vec().createWithArray(sol_array, comm=system.comm)
+        rhs_petsc_vec = PETSc.Vec().createWithArray(rhs_array, comm=system.comm)
+
+        # run PETSc solver
+        self._iter_count = 0
+        ksp = self._get_ksp_solver(system)
+        ksp.setTolerances(max_it=maxiter, atol=atol, rtol=rtol)
+        ksp.solve(rhs_petsc_vec, sol_petsc_vec)
+
+        # stuff the result into the x vector
+        x_vec.set_val(sol_array)
+
+        sol_petsc_vec = rhs_petsc_vec = None
+
+        print('*** solved in Krylov | time = %.2E s ***' % (time.time() - time0))
+
+        # EDIT-CACHE
+        # --- append the current solution to the cache ---
+        if self.options['use_cache']:
+            self._rhs_cache_list.append(rhs_array.copy())
+            self._sol_cache_list.append(sol_array.copy())
+
+    def apply(self, mat, in_vec, result):
+        """
+        Apply preconditioner.
+
+        Parameters
+        ----------
+        mat : PETSc.Mat
+            PETSc matrix object.
+        in_vec : PETSc.Vector
+            Incoming vector.
+        result : PETSc.Vector
+            Empty vector in which the preconditioned in_vec is stored.
+        """
+        if self.precon:
+            system = self._system()
+            mode = self._mode
+
+            # Need to clear out any junk from the inputs.
+            system._dinputs.set_val(0.0)
+
+            # assign x and b vectors based on mode
+            if mode == 'fwd':
+                x_vec = system._doutputs
+                b_vec = system._dresiduals
+            else:  # rev
+                x_vec = system._dresiduals
+                b_vec = system._doutputs
+
+            # set value of b vector to KSP provided value
+            b_vec.set_val(_get_petsc_vec_array(in_vec))
+
+            # call the preconditioner
+            self._solver_info.append_precon()
+            self.precon.solve(mode)
+            self._solver_info.pop()
+
+            # stuff resulting value of x vector into result for KSP
+            result.array[:] = x_vec.asarray()
+        else:
+            # no preconditioner, just pass back the incoming vector
+            result.array[:] = _get_petsc_vec_array(in_vec)
+
+    def _get_ksp_solver(self, system):
+        """
+        Get an instance of the KSP solver in `system`.
+
+        Instances will be created on first request and cached for future use.
+
+        Parameters
+        ----------
+        system : `System`
+            Parent `System` object.
+
+        Returns
+        -------
+        KSP
+            the KSP solver instance.
+        """
+        # use cached instance if available
+        if self._ksp:
+            return self._ksp
+
+        iproc = system.comm.rank
+        lsize = np.sum(system._var_sizes['output'][iproc, :])
+        size = np.sum(system._var_sizes['output'])
+
+        jac_mat = PETSc.Mat().createPython([(lsize, size), (lsize, size)],
+                                           comm=system.comm)
+        jac_mat.setPythonContext(self)
+        jac_mat.setUp()
+
+        ksp = self._ksp = PETSc.KSP().create(comm=system.comm)
+
+        ksp.setOperators(jac_mat)
+        ksp.setType(self.options['ksp_type'])
+        ksp.setGMRESRestart(self.options['restart'])
+        if self.options['precon_side'] == 'left':
+            ksp.setPCSide(PETSc.PC.Side.LEFT)
+        else:
+            ksp.setPCSide(PETSc.PC.Side.RIGHT)
+        ksp.setMonitor(Monitor(self))
+        ksp.setInitialGuessNonzero(True)
+
+        pc_mat = ksp.getPC()
+        pc_mat.setType('python')
+        pc_mat.setPythonContext(self)
+
+        return ksp

From cb7012d1abc9f1c9ff37cdf96c4b6b885f2a3492 Mon Sep 17 00:00:00 2001
From: kanekosh <shugok@umich.edu>
Date: Fri, 20 Oct 2023 17:03:38 -0400
Subject: [PATCH 2/5] minor edits

---
 POEM_093.md | 41 +++++++++++++++++++++++------------------
 1 file changed, 23 insertions(+), 18 deletions(-)

diff --git a/POEM_093.md b/POEM_093.md
index a95fa931..f2b72891 100644
--- a/POEM_093.md
+++ b/POEM_093.md
@@ -16,14 +16,14 @@ Status:
 ## Motivation
 When computing derivatives with a hierarchical linear solver, OpenMDAO may re-solve the same subsystem-level linear system multiple times, which is redundant.
 Examples of such problems will be provided later.  
-This POEM proposes to remove this redundancy and accelerate the total derivatives computation by caching the subsystem-level linear solutions.
+This POEM proposes to remove this redundancy and accelerate the total derivative computation by caching the subsystem-level linear solutions.
 
 ## Description
 We suggest caching the right-hand side (RHS) vector `b` and the corresponding linear system solution `x` within the subsystem's linear solver.
-The caching can be optional; by default, it would not cache the solution to  avoid unnecessary overhead of saving/checking the cache.  
-When the user enabled caching, the linear solver checks if the given RHS vector `b` is parallel to a cached vector `b_cache` before solving the linear system.
+The caching can be optional; by default, it would not cache the solution to  avoid the unnecessary overhead of saving/checking the cache.  
+When the user enables caching, the linear solver checks if the given RHS vector `b` is parallel to a cached vector `b_cache` before solving the linear system.
 If they are parallel, it returns the linear system solution by `x = x_cache * (|b| / |b_cache|)` without solving the linear system.
-Otherwise, it solves the linear solver and add the solution and corresponding RHS vector to the cache.
+Otherwise, it solves the linear solver and adds the solution and corresponding RHS vector to the cache.
 
 ## API change proposal
 
@@ -43,14 +43,14 @@ For example,
 ```python
 subsystem.use_cache(mode='rev', cache_inputs=None, cache_outputs=['Lift'])
 ```
-would enable caching only for the `Lift` output in the reverse mode, but not for other outputs.
+would enable caching only for the `Lift` output in the reverse mode but not for other outputs.
 
 ## Prototype implementation
 We implemented a prototype of the linear solution caching in a custom PETScKrylov solver and Direct solver, which are available [here](POEM_093/).
 
-Note that this prototype caches all the inputs or outputs of a subsystem (i.e., it implements the proposal 1 above).
-The implementation of the caching for specific inputs or outputs (proposal 2) would require a different approach (which we don't have a good idea of how it can be done).  
-Also note that this prototype does not implement the cache resetting, which should be done every optimization iterations.
+Note that this prototype caches all the inputs or outputs of a subsystem (i.e., it implements proposal 1 above).
+The implementation of the caching for specific inputs or outputs (proposal 2) would require a different approach (we don't have a good idea of how it can be done).  
+Also note that this prototype does not implement the cache resetting, which should be done every optimization iteration.
 
 Here, we explain how the caching would be implemented for a Krylov solver.
 It works mostly the same for a Direct solver.
@@ -97,7 +97,7 @@ def _linearize(self):
         self._sol_cache_list = []
 ```
 
-In the `solve` method, it checks and reuses the cached solution if applicable.
+The `solve` method compares the given RHS to caches RHSs, and it reuses a cached solution if applicable.
 Otherwise, it adds the RHS vector and the solution to the cache at the end.
 ```python
 def solve(self, mode, rel_systems=None):
@@ -187,7 +187,7 @@ def solve(self, mode, rel_systems=None):
 ```
 
 ## Examples of Caching: 
-### 2-point Aerostructural optimization
+### Two-point aerostructural optimization
 Here is an [OpenAeroStruct multipoint optimization example](POEM_093/example1.py) to demonstrate the solution caching.
 We consider the following optimization problem:
 ```
@@ -196,8 +196,8 @@ subject to: CL0 = 0.6
             CL1 - CL0 = 0.2
 ```
 
-For the second operating point, we impose the CL constraint in a way that it also depends on the first point's CL.
-To compute the total derivatives, we use PETScKrylov solver for aerostructural coupling.
+For the second operating point, we impose the CL constraint in a way that also depends on the first point's CL.
+To compute the total derivatives, we use the PETScKrylov solver for aerostructural coupling.
 The solver outputs of `compute_totals()` look as follows:
 ```
 # dCD0/dx
@@ -232,7 +232,7 @@ LN: PETScKrylov 3 ; 1.82688692e-11 1.92001608e-09
 Derivatives of CL_diff w.r.t. twist_cp = [[-2.17863550e-05 -3.79972743e-05 -4.59107509e-05 -4.77229851e-05]]
 ```
 We can observe that there were 5 aerostructural adjoint solves: 2 for objective, 1 for the first CL constraint, and 2 for the second CL constraint.
-However, there are duplicated adjoint solve for `dCL0/dx` because `CL0` is used for both the first and second CL constraints.
+However, there are duplicated adjoint solves for `dCL0/dx` because `CL0` is used for both the first and second CL constraints.
 
 This duplicated adjoint solves can be avoided using the solution caching.
 Using [the prototype solver](POEM_093/petsc_ksp_custom.py), we get the following solver outputs:
@@ -269,13 +269,13 @@ LN: PETScKrylov 3 ; 1.19985619e-11 1.25990231e-09
 # verification print
 Derivatives of CL_diff w.r.t. twist_cp = [[-2.17863550e-05 -3.79972743e-05 -4.59107509e-05 -4.77229851e-05]]
 ```
-The caching removed the redundant `dCL0/dx` solve but still gave the identical total derivatives (verified by printed CL_diff w.r.t. twist_cp).
+The caching removed the redundant `dCL0/dx` solve but gave identical total derivatives (verified by printed CL_diff w.r.t. twist_cp).
 
 ### More practical multipoint problem
 In the above example, we formulate the second CL constraint in a weird way to demonstrate the caching.
 However, the above model structure represents practical problems.
 
-As an example, we again consider [another multipoint problem](POEM_093/example2.py) (cruise point + manuever point), but now with more practical objective and constraints:  
+As an example, we again consider [another multipoint problem](POEM_093/example2.py) (cruise point + maneuver point), but now with more practical objective and constraints:  
 ```
 minimize:   fuel burn  
 subject to: L = W at cruise point
@@ -283,11 +283,11 @@ subject to: L = W at cruise point
 ```
 
 The `L=W` constraints depend on the fuel burn, which is computed using the cruise drag.
-Therefore, `L=W` constraint at the maneuver point depends on the fuel burn (or cruise drag) of the cruise point.
+Therefore, the `L=W` constraint at the maneuver point depends on the fuel burn (or cruise drag) of the cruise point.
 This introduces the redundant adjoint solves, i.e., the adjoint for maneuver point `L=W` calls the adjoint for cruise point.
 
 The linear solution caching can avoid this redundancy.
-The following is the linear solver outputs with caching:
+The following are the linear solver outputs with caching:
 ```
 # adjoint for fuelburn (cruise point OAS adjoint)
 LN: PETScKrylov 0 ; 244.940409 1
@@ -303,7 +303,7 @@ LN: PETScKrylov 2 ; 2.89301257e-09 3.67159618e-07
 LN: PETScKrylov 3 ; 1.68153368e-12 2.13407736e-10
 *** solved in Krylov | time = 9.81E-03 s ***
 
-# adjoint for L=W at maneuver point. This uses cache from the cruise point adjoint
+# adjoint for L=W at maneuver point. This uses the cache from the cruise point adjoint
 LN: PETScKrylov 0 ; 1.70385305e-05 1
 LN: PETScKrylov 1 ; 1.50491383e-06 0.0883241564
 LN: PETScKrylov 2 ; 9.45843606e-10 5.55120411e-05
@@ -312,3 +312,8 @@ LN: PETScKrylov 3 ; 5.54498891e-13 3.25438213e-08
 *** use caching in Krylov with scaler = 1.19E-05 | time = 4.44E-04 s ***
 ```
 
+## Package versions
+We used the following package versions for the demonstrations above.
+- openmdao 3.28.1.dev0
+- openaerostruct 2.7.0
+- petsc 3.18.5
\ No newline at end of file

From cd8093a7468a68f9714b1d254955e18982063b49 Mon Sep 17 00:00:00 2001
From: Anil Yildirim <anily@umich.edu>
Date: Tue, 24 Oct 2023 14:08:16 -0400
Subject: [PATCH 3/5] anils edits

---
 POEM_093.md | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/POEM_093.md b/POEM_093.md
index f2b72891..4c6e945d 100644
--- a/POEM_093.md
+++ b/POEM_093.md
@@ -14,17 +14,21 @@ Status:
 - [ ] Integrated
 
 ## Motivation
-When computing derivatives with a hierarchical linear solver, OpenMDAO may re-solve the same subsystem-level linear system multiple times, which is redundant.
+When computing derivatives with a hierarchical linear solver, OpenMDAO may re-solve the same subsystem-level linear system with the same right-hand-side (RHS) multiple times, which is redundant.
 Examples of such problems will be provided later.  
 This POEM proposes to remove this redundancy and accelerate the total derivative computation by caching the subsystem-level linear solutions.
 
 ## Description
-We suggest caching the right-hand side (RHS) vector `b` and the corresponding linear system solution `x` within the subsystem's linear solver.
-The caching can be optional; by default, it would not cache the solution to  avoid the unnecessary overhead of saving/checking the cache.  
+We suggest caching the RHS vector `b` and the corresponding linear system solution `x` within the subsystem's linear solver.
+The caching can be optional; by default, it would not cache the solution to avoid the unnecessary overhead of saving and checking the cache.  
 When the user enables caching, the linear solver checks if the given RHS vector `b` is parallel to a cached vector `b_cache` before solving the linear system.
 If they are parallel, it returns the linear system solution by `x = x_cache * (|b| / |b_cache|)` without solving the linear system.
 Otherwise, it solves the linear solver and adds the solution and corresponding RHS vector to the cache.
 
+In this POEM, we focus on the specific approach of storing the solutions and only re-using them when the new RHS is parallel to the previous one.
+In the more general case, deflated and augmented Krylov subspace techniques [[Chapman and Saad, 1997](https://doi.org/10.1002/(SICI)1099-1506(199701/02)4:1%3C43::AID-NLA99%3E3.0.CO;2-Z)] can be used to accelerate convergence of the solutions with multiple RHS vectors.
+
+
 ## API change proposal
 
 **1. Caching all inputs or outputs of a subsystem**  

From e68c03f88be9b1bb1d4078d1da95f704f0420570 Mon Sep 17 00:00:00 2001
From: Rob Falck <robfalck@gmail.com>
Date: Fri, 12 Apr 2024 13:33:06 -0400
Subject: [PATCH 4/5] Update POEM_093.md

Added implementation PR
---
 POEM_093.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/POEM_093.md b/POEM_093.md
index 4c6e945d..e6c34dc5 100644
--- a/POEM_093.md
+++ b/POEM_093.md
@@ -3,7 +3,7 @@ Title: Linear solution caching
 authors: kanekosh (Shugo Kaneko) and anilyil (Anil Yildirim)  
 Competing POEMs:   
 Related POEMs:  
-Associated implementation PR:  
+Associated implementation PR: [#3185](https://github.com/OpenMDAO/OpenMDAO/pull/3185)  
 
 Status:
 
@@ -320,4 +320,4 @@ LN: PETScKrylov 3 ; 5.54498891e-13 3.25438213e-08
 We used the following package versions for the demonstrations above.
 - openmdao 3.28.1.dev0
 - openaerostruct 2.7.0
-- petsc 3.18.5
\ No newline at end of file
+- petsc 3.18.5

From 095f7368bff8b570ad99560eb13127ee7880e1a8 Mon Sep 17 00:00:00 2001
From: Rob Falck <robfalck@gmail.com>
Date: Fri, 12 Apr 2024 13:43:57 -0400
Subject: [PATCH 5/5] Accepted POEM_093.md

---
 POEM_093.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/POEM_093.md b/POEM_093.md
index e6c34dc5..d79bb87b 100644
--- a/POEM_093.md
+++ b/POEM_093.md
@@ -7,9 +7,9 @@ Associated implementation PR: [#3185](https://github.com/OpenMDAO/OpenMDAO/pull/
 
 Status:
 
-- [x] Active
+- [ ] Active
 - [ ] Requesting decision
-- [ ] Accepted
+- [x] Accepted
 - [ ] Rejected
 - [ ] Integrated