-
Notifications
You must be signed in to change notification settings - Fork 27
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #190 from kanekosh/poem_093
POEM 093: linear solution caching
- Loading branch information
Showing
5 changed files
with
2,008 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,323 @@ | ||
POEM ID: 093 | ||
Title: Linear solution caching | ||
authors: kanekosh (Shugo Kaneko) and anilyil (Anil Yildirim) | ||
Competing POEMs: | ||
Related POEMs: | ||
Associated implementation PR: [#3185](https://github.com/OpenMDAO/OpenMDAO/pull/3185) | ||
|
||
Status: | ||
|
||
- [ ] Active | ||
- [ ] Requesting decision | ||
- [x] Accepted | ||
- [ ] Rejected | ||
- [ ] Integrated | ||
|
||
## Motivation | ||
When computing derivatives with a hierarchical linear solver, OpenMDAO may re-solve the same subsystem-level linear system with the same right-hand-side (RHS) multiple times, which is redundant. | ||
Examples of such problems will be provided later. | ||
This POEM proposes to remove this redundancy and accelerate the total derivative computation by caching the subsystem-level linear solutions. | ||
|
||
## Description | ||
We suggest caching the RHS vector `b` and the corresponding linear system solution `x` within the subsystem's linear solver. | ||
The caching can be optional; by default, it would not cache the solution to avoid the unnecessary overhead of saving and checking the cache. | ||
When the user enables caching, the linear solver checks if the given RHS vector `b` is parallel to a cached vector `b_cache` before solving the linear system. | ||
If they are parallel, it returns the linear system solution by `x = x_cache * (|b| / |b_cache|)` without solving the linear system. | ||
Otherwise, it solves the linear solver and adds the solution and corresponding RHS vector to the cache. | ||
|
||
In this POEM, we focus on the specific approach of storing the solutions and only re-using them when the new RHS is parallel to the previous one. | ||
In the more general case, deflated and augmented Krylov subspace techniques [[Chapman and Saad, 1997](https://doi.org/10.1002/(SICI)1099-1506(199701/02)4:1%3C43::AID-NLA99%3E3.0.CO;2-Z)] can be used to accelerate convergence of the solutions with multiple RHS vectors. | ||
|
||
|
||
## API change proposal | ||
|
||
**1. Caching all inputs or outputs of a subsystem** | ||
A user may enable the caching when adding a linear solver to a subsystem. | ||
For example, | ||
|
||
```python | ||
subsystem1.linear_solver = om.PETScKrylov(use_cache=True) | ||
``` | ||
would enable caching for `subsystem1`. | ||
This does not change the behavior of the current models if we set the default to `use_cache=False`. | ||
|
||
**2. Caching specific inputs or outputs** | ||
Furthermore, enabling the caching only for the user-specified inputs (in forward mode) or outputs (in reverse mode) can further reduce the computational overhead of the caching. | ||
For example, | ||
```python | ||
subsystem.use_cache(mode='rev', cache_inputs=None, cache_outputs=['Lift']) | ||
``` | ||
would enable caching only for the `Lift` output in the reverse mode but not for other outputs. | ||
|
||
## Prototype implementation | ||
We implemented a prototype of the linear solution caching in a custom PETScKrylov solver and Direct solver, which are available [here](POEM_093/). | ||
|
||
Note that this prototype caches all the inputs or outputs of a subsystem (i.e., it implements proposal 1 above). | ||
The implementation of the caching for specific inputs or outputs (proposal 2) would require a different approach (we don't have a good idea of how it can be done). | ||
Also note that this prototype does not implement the cache resetting, which should be done every optimization iteration. | ||
|
||
Here, we explain how the caching would be implemented for a Krylov solver. | ||
It works mostly the same for a Direct solver. | ||
The complete custom solver file is available [here](POEM_093/petsc_ksp_custom.py). | ||
|
||
First, we add a new option for caching: | ||
```python | ||
def _declare_options(self): | ||
""" | ||
Declare options before kwargs are processed in the init method. | ||
""" | ||
super()._declare_options() | ||
|
||
self.options.declare('ksp_type', default='fgmres', values=KSP_TYPES, | ||
desc="KSP algorithm to use. Default is 'fgmres'.") | ||
|
||
self.options.declare('restart', default=1000, types=int, | ||
desc='Number of iterations between restarts. Larger values increase ' | ||
'iteration cost, but may be necessary for convergence') | ||
|
||
self.options.declare('precon_side', default='right', values=['left', 'right'], | ||
desc='Preconditioner side, default is right.') | ||
|
||
# --- option for caching. Default: no cache --- | ||
self.options.declare('use_cache', types=bool, default=False) | ||
|
||
# changing the default maxiter from the base class | ||
self.options['maxiter'] = 100 | ||
``` | ||
|
||
Then, the solver initializes the cache in `_linearize` (or in `__init__`): | ||
```python | ||
def _linearize(self): | ||
""" | ||
Perform any required linearization operations such as matrix factorization. | ||
""" | ||
print('*** linearize in Krylov ***') | ||
if self.precon is not None: | ||
self.precon._linearize() | ||
|
||
# --- initialize cache --- | ||
if self.options['use_cache']: | ||
self._rhs_cache_list = [] | ||
self._sol_cache_list = [] | ||
``` | ||
|
||
The `solve` method compares the given RHS to caches RHSs, and it reuses a cached solution if applicable. | ||
Otherwise, it adds the RHS vector and the solution to the cache at the end. | ||
```python | ||
def solve(self, mode, rel_systems=None): | ||
""" | ||
Solve the linear system for the problem in self._system(). | ||
The full solution vector is returned. | ||
Parameters | ||
---------- | ||
mode : str | ||
Derivative mode, can be 'fwd' or 'rev'. | ||
rel_systems : set of str | ||
Names of systems relevant to the current solve. | ||
""" | ||
self._rel_systems = rel_systems | ||
self._mode = mode | ||
|
||
system = self._system() | ||
options = self.options | ||
|
||
if system.under_complex_step: | ||
raise RuntimeError('{}: PETScKrylov solver is not supported under ' | ||
'complex step.'.format(self.msginfo)) | ||
|
||
maxiter = options['maxiter'] | ||
atol = options['atol'] | ||
rtol = options['rtol'] | ||
|
||
# assign x and b vectors based on mode | ||
if self._mode == 'fwd': | ||
x_vec = system._doutputs | ||
b_vec = system._dresiduals | ||
else: # rev | ||
x_vec = system._dresiduals | ||
b_vec = system._doutputs | ||
|
||
# create numpy arrays to interface with PETSc | ||
sol_array = x_vec.asarray(copy=True) | ||
rhs_array = b_vec.asarray(copy=True) | ||
|
||
# --- check if we can reuse the cached solutions --- | ||
if self.options['use_cache']: | ||
for i, rhs_cache in enumerate(self._rhs_cache_list): # loop over cached RHS vectors. NOTE: reverse order loop would be faster? | ||
# Check if the RHS vector is the same as a cached vector. This part is not necessary, but is less expensive than checking if two vectors are parallel. | ||
if np.allclose(rhs_array, rhs_cache, rtol=1e-12, atol=1e-12): | ||
x_vec.set_val(self._sol_cache_list[i]) | ||
print('*** use caching in Krylov with scaler = 1', 'time =', time.time() - time0, '***') | ||
return | ||
# Check if the RHS vector is equal to -1 * cached vector. | ||
elif np.allclose(rhs_array, -rhs_cache, rtol=1e-12, atol=1e-12): | ||
x_vec.set_val(-self._sol_cache_list[i]) | ||
print('*** use caching in Krylov with scaler = -1', 'time =', time.time() - time0, '***') | ||
return | ||
|
||
# Check if the RHS vector and a cached vector are parallel | ||
# NOTE: the following parallel vector check may be inaccurate for some cases. maybe should tighten the tolerance? | ||
dot_product = np.dot(rhs_array, rhs_cache) | ||
norm1 = np.linalg.norm(rhs_array) | ||
norm2 = np.linalg.norm(rhs_cache) | ||
if np.isclose(abs(dot_product), norm1 * norm2, rtol=1e-12, atol=1e-12): | ||
# two vectors are parallel, thus we can use the cache. | ||
scaler = dot_product / norm2**2 | ||
x_vec.set_val(self._sol_cache_list[i] * scaler) | ||
print('*** use caching in Krylov with scaler =', scaler, ' | time =', time.time() - time0, '***') | ||
return | ||
|
||
# create PETSc vectors from numpy arrays | ||
sol_petsc_vec = PETSc.Vec().createWithArray(sol_array, comm=system.comm) | ||
rhs_petsc_vec = PETSc.Vec().createWithArray(rhs_array, comm=system.comm) | ||
|
||
# run PETSc solver | ||
self._iter_count = 0 | ||
ksp = self._get_ksp_solver(system) | ||
ksp.setTolerances(max_it=maxiter, atol=atol, rtol=rtol) | ||
ksp.solve(rhs_petsc_vec, sol_petsc_vec) | ||
|
||
# stuff the result into the x vector | ||
x_vec.set_val(sol_array) | ||
|
||
sol_petsc_vec = rhs_petsc_vec = None | ||
|
||
# --- append the current solution to the cache --- | ||
if self.options['use_cache']: | ||
self._rhs_cache_list.append(rhs_array.copy()) | ||
self._sol_cache_list.append(sol_array.copy()) | ||
``` | ||
|
||
## Examples of Caching: | ||
### Two-point aerostructural optimization | ||
Here is an [OpenAeroStruct multipoint optimization example](POEM_093/example1.py) to demonstrate the solution caching. | ||
We consider the following optimization problem: | ||
``` | ||
minimize: CD0 + CD1 | ||
subject to: CL0 = 0.6 | ||
CL1 - CL0 = 0.2 | ||
``` | ||
|
||
For the second operating point, we impose the CL constraint in a way that also depends on the first point's CL. | ||
To compute the total derivatives, we use the PETScKrylov solver for aerostructural coupling. | ||
The solver outputs of `compute_totals()` look as follows: | ||
``` | ||
# dCD0/dx | ||
LN: PETScKrylov 0 ; 0.00299022306 1 | ||
LN: PETScKrylov 1 ; 8.49989958e-08 2.84256372e-05 | ||
LN: PETScKrylov 2 ; 7.30820654e-11 2.44403391e-08 | ||
# dCD1/dx | ||
LN: PETScKrylov 0 ; 0.00333369362 1 | ||
LN: PETScKrylov 1 ; 8.50541382e-08 2.55134838e-05 | ||
LN: PETScKrylov 2 ; 8.31048387e-11 2.49287572e-08 | ||
# dCL0/dx | ||
LN: PETScKrylov 0 ; 0.00951495633 1 | ||
LN: PETScKrylov 1 ; 2.65013995e-06 0.000278523607 | ||
LN: PETScKrylov 2 ; 1.22310119e-09 1.28545118e-07 | ||
LN: PETScKrylov 3 ; 1.82688692e-11 1.92001608e-09 | ||
# dCL1/dx | ||
LN: PETScKrylov 0 ; 0.00952340656 1 | ||
LN: PETScKrylov 1 ; 2.64776928e-06 0.000278027538 | ||
LN: PETScKrylov 2 ; 1.04735578e-09 1.0997701e-07 | ||
LN: PETScKrylov 3 ; 1.19985619e-11 1.25990231e-09 | ||
# dCL0/dx again - identical history to the previous dCL0/dx solve. | ||
LN: PETScKrylov 0 ; 0.00951495633 1 | ||
LN: PETScKrylov 1 ; 2.65013995e-06 0.000278523607 | ||
LN: PETScKrylov 2 ; 1.22310119e-09 1.28545118e-07 | ||
LN: PETScKrylov 3 ; 1.82688692e-11 1.92001608e-09 | ||
# verification print | ||
Derivatives of CL_diff w.r.t. twist_cp = [[-2.17863550e-05 -3.79972743e-05 -4.59107509e-05 -4.77229851e-05]] | ||
``` | ||
We can observe that there were 5 aerostructural adjoint solves: 2 for objective, 1 for the first CL constraint, and 2 for the second CL constraint. | ||
However, there are duplicated adjoint solves for `dCL0/dx` because `CL0` is used for both the first and second CL constraints. | ||
|
||
This duplicated adjoint solves can be avoided using the solution caching. | ||
Using [the prototype solver](POEM_093/petsc_ksp_custom.py), we get the following solver outputs: | ||
``` | ||
# dCD0/dx | ||
LN: PETScKrylov 0 ; 0.00299022306 1 | ||
LN: PETScKrylov 1 ; 8.49989958e-08 2.84256372e-05 | ||
LN: PETScKrylov 2 ; 7.30820654e-11 2.44403391e-08 | ||
*** solved in Krylov | time = 8.14E-03 s *** | ||
# dCD1/dx | ||
LN: PETScKrylov 0 ; 0.00333369362 1 | ||
LN: PETScKrylov 1 ; 8.50541382e-08 2.55134838e-05 | ||
LN: PETScKrylov 2 ; 8.31048387e-11 2.49287572e-08 | ||
*** solved in Krylov | time = 7.97E-03 s *** | ||
# dCL0/dx | ||
LN: PETScKrylov 0 ; 0.00951495633 1 | ||
LN: PETScKrylov 1 ; 2.65013995e-06 0.000278523607 | ||
LN: PETScKrylov 2 ; 1.22310119e-09 1.28545118e-07 | ||
LN: PETScKrylov 3 ; 1.82688692e-11 1.92001608e-09 | ||
*** solved in Krylov | time = 1.00E-02 s *** | ||
# dCL1/dx | ||
LN: PETScKrylov 0 ; 0.00952340656 1 | ||
LN: PETScKrylov 1 ; 2.64776928e-06 0.000278027538 | ||
LN: PETScKrylov 2 ; 1.04735578e-09 1.0997701e-07 | ||
LN: PETScKrylov 3 ; 1.19985619e-11 1.25990231e-09 | ||
*** solved in Krylov | time = 1.01E-02 s *** | ||
# dCL0/dx again - now use cached solution without running the Krylov solver | ||
*** use caching in Krylov with scaler = -1 | time = 6.11E-04 s *** | ||
# verification print | ||
Derivatives of CL_diff w.r.t. twist_cp = [[-2.17863550e-05 -3.79972743e-05 -4.59107509e-05 -4.77229851e-05]] | ||
``` | ||
The caching removed the redundant `dCL0/dx` solve but gave identical total derivatives (verified by printed CL_diff w.r.t. twist_cp). | ||
|
||
### More practical multipoint problem | ||
In the above example, we formulate the second CL constraint in a weird way to demonstrate the caching. | ||
However, the above model structure represents practical problems. | ||
|
||
As an example, we again consider [another multipoint problem](POEM_093/example2.py) (cruise point + maneuver point), but now with more practical objective and constraints: | ||
``` | ||
minimize: fuel burn | ||
subject to: L = W at cruise point | ||
L = W at maneuver point | ||
``` | ||
|
||
The `L=W` constraints depend on the fuel burn, which is computed using the cruise drag. | ||
Therefore, the `L=W` constraint at the maneuver point depends on the fuel burn (or cruise drag) of the cruise point. | ||
This introduces the redundant adjoint solves, i.e., the adjoint for maneuver point `L=W` calls the adjoint for cruise point. | ||
|
||
The linear solution caching can avoid this redundancy. | ||
The following are the linear solver outputs with caching: | ||
``` | ||
# adjoint for fuelburn (cruise point OAS adjoint) | ||
LN: PETScKrylov 0 ; 244.940409 1 | ||
LN: PETScKrylov 1 ; 0.00367094295 1.49870859e-05 | ||
LN: PETScKrylov 2 ; 5.5435276e-06 2.2632148e-08 | ||
LN: PETScKrylov 3 ; 2.08827096e-08 8.5256286e-11 | ||
*** solved in Krylov | time = 1.22E-02 s *** | ||
# adjoint for L=W at cruise point | ||
LN: PETScKrylov 0 ; 0.00787944105 1 | ||
LN: PETScKrylov 1 ; 4.18719356e-06 0.000531407436 | ||
LN: PETScKrylov 2 ; 2.89301257e-09 3.67159618e-07 | ||
LN: PETScKrylov 3 ; 1.68153368e-12 2.13407736e-10 | ||
*** solved in Krylov | time = 9.81E-03 s *** | ||
# adjoint for L=W at maneuver point. This uses the cache from the cruise point adjoint | ||
LN: PETScKrylov 0 ; 1.70385305e-05 1 | ||
LN: PETScKrylov 1 ; 1.50491383e-06 0.0883241564 | ||
LN: PETScKrylov 2 ; 9.45843606e-10 5.55120411e-05 | ||
LN: PETScKrylov 3 ; 5.54498891e-13 3.25438213e-08 | ||
*** solved in Krylov | time = 1.17E-02 s *** | ||
*** use caching in Krylov with scaler = 1.19E-05 | time = 4.44E-04 s *** | ||
``` | ||
|
||
## Package versions | ||
We used the following package versions for the demonstrations above. | ||
- openmdao 3.28.1.dev0 | ||
- openaerostruct 2.7.0 | ||
- petsc 3.18.5 |
Oops, something went wrong.