diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml index 0cf5ec7..bcf9989 100644 --- a/.github/workflows/build.yml +++ b/.github/workflows/build.yml @@ -27,7 +27,7 @@ jobs: - name: Install dependencies run: | python -m pip install --upgrade pip - python -m pip install pytest numpy nbval tabulate + python -m pip install pytest numpy nbval tabulate scipy sympy python -m pip install -e . - name: Test with pytest run: | diff --git a/docs/notebooks/burger.ipynb b/docs/notebooks/burger.ipynb new file mode 100644 index 0000000..136bfc6 --- /dev/null +++ b/docs/notebooks/burger.ipynb @@ -0,0 +1,752 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# *checkpoint_schedule* application: Adjoint-Based Gradient with the Burger's Equation\n", + "\n", + "This example shows adjoint-based gradient computation using the *checkpointing_schedules* package. We initially define the adjoint-based gradient problem and then present the forward and adjoint solvers prescribed by the *checkpointing_schedules* package.\n", + "\n", + "## Defining the application\n", + "\n", + "Let us consider a one-dimensional (1D) problem aiming to compute the gradient/sensitivity of the kinetic energy at a time $t=\\tau$ with respect to an initial condition $u_0$. Then, our functional can be expressed as:\n", + "$$\n", + "I(u(x, \\tau, u_0)) = \\int_\\Omega \\frac{1}{2} u(x, \\tau, u_0)u(x, \\tau, u_0) \\, dx.\n", + "\\tag{1} \n", + "$$\n", + "\n", + "In the current case, the velocity variable $u = u(x, t)$ is governed by the 1D viscous Burgers equation, a non-linear equation for the advection and diffusion of momentum:\n", + "\n", + "$$\n", + "\\frac{\\partial u}{\\partial t} + u \\frac{\\partial u}{\\partial x} - \\nu \\frac{\\partial^2 u}{\\partial x^2} = 0.\n", + "\\tag{2}\n", + "$$\n", + "\n", + "Here, $x \\in [0, L]$ is the space variable, and $t \\in \\mathbb{R}^{+}$ represents the time variable. The boundary condition is $u(0, t) = u(L, t) = 0$, where $L$ is the length of the 1D domain. The initial condition is given by $u_0 = \\sin(\\pi x)$.\n", + "\n", + "The control parameter is the initial condition $u_0$. Hence, the objective is to compute the adjoint-based gradient of the functional $I(u_0)$ with respect to $u_0$.\n", + "\n", + "This example uses the discrete adjoint formulation, meaning the adjoint system is derived after discretization of the forward problem." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Adjoint-based gradient\n", + "We seek to compute the sensitivity/gradient $\\mathrm{d}\\widehat{I}/\\mathrm{d}u_0$, where $\\widehat{I} = I(u_0)$ is the reduced functional. On applying the chain rule, we have the below expression on considering the functional $I$ as in Eq. (1).\n", + "$$\\frac{\\mathrm{d}\\widehat{I}}{\\mathrm{d}u_0} = \\frac{\\partial I}{\\partial u} \\frac{\\mathrm{d}u}{\\mathrm{d}u_0}. \\tag{3}$$\n", + "Computing $\\partial J/\\partial u$ is straightforward since the functional (Eq. (1)) is written in terms of the solution variable $u$. In contrast, the Jacobian $\\mathrm{d}u/\\mathrm{d}u_0$ involves an expensive computational operation that is avoided by using the adjoint method.\n", + "\n", + "The strategy employed for the adjoint derivation is based on [1] (section 2.3.3), in which the relation between the functional $I$ and the forward equation $F(u, u_0)$ is reached by taking the total derivative of $F(u, u_0)$ with respect to $u_0$, yielding in a relationship that provides the Jacobian $\\mathrm{d}u/\\mathrm{d}u_0$, as shown below.\n", + "\\begin{align*}\n", + " \\frac{\\mathrm{d}}{\\mathrm{d}u_0} F(u, u_0) &= \\frac{\\partial F(u, u_0)}{\\partial u} \\frac{\\mathrm{d}u}{\\mathrm{d}u_0} + \\frac{\\partial F(u, u_0)}{\\partial u_0} = 0 \\\\\n", + " \\\\\n", + " \\implies \\frac{\\mathrm{d}u}{\\mathrm{d}u_0} &= - \\left(\\frac{\\partial F(u, u_0)}{\\partial u}\\right)^{-1} \\frac{\\partial F(u, u_0)}{\\partial u_0}.\n", + "\\end{align*}\n", + "Substituting $\\mathrm{d}u/\\mathrm{d}u_0$ into Eq. (3), we have:\n", + "$$\\frac{\\mathrm{d}\\widehat{I}}{\\mathrm{d}u_0} = - \\frac{\\partial I}{\\partial u} \\left(\\frac{\\partial F(u, u_0)}{\\partial u}\\right)^{-1} \\frac{\\partial F(u, u_0)}{\\partial u_0}.$$\n", + " \n", + "Taking the adjoint (Hermitian transpose) of the above equation:\n", + "$$\\frac{\\mathrm{d}\\widehat{I}}{\\mathrm{d}u_0}^* = - \\frac{\\partial F}{\\partial u_0}^* \\frac{\\partial F}{\\partial u}^{-*} \\frac{\\partial I}{\\partial u}^{*}$$\n", + "\n", + "Next, we define:\n", + "$$\\lambda = \\left(\\frac{\\partial F(u, u_0)}{\\partial u}\\right)^{-*} \\frac{\\partial I}{\\partial u}^* \n", + " \\implies \\left(\\frac{\\partial F(u, u_0)}{\\partial u}\\right)^{*} \\lambda = \\frac{\\partial I}{\\partial u}^*,$$\n", + "which is the adjoint system. Finally, we obtain the following expression for the adjoint-based gradient computation:\n", + "$$\\frac{\\mathrm{d}\\widehat{I}}{\\mathrm{d} u_0}^* = - \\frac{\\partial F}{\\partial u_0}^* \\lambda.$$\n", + "In this current case, Burger's equation is written in weak form and discretized in time with the backward finite difference method, we have:\n", + "$$ F(u^{n+1}, u^n, v) = \\int_{\\Omega} u^{n + 1} \\cdot v - u^{n} \\cdot v + \\Delta t \\left(u^{n + 1} \\frac{\\partial u^{n + 1}}{\\partial x} \\cdot v + \\nu \\frac{\\partial u^{n + 1}}{\\partial x} \\cdot \\frac{\\partial v}{\\partial x}\\right) \\, dx = 0 \\quad \\forall v\\in V \\tag{4},$$\n", + "where $v \\in V$ is an arbitrary test function. \n", + "\n", + "The time sequence of the adjoint system for the functional (Eq. (1)) is given by:\n", + "$$\n", + "\\begin{align*}\n", + " \\lambda^{N+1} &= \\frac{\\partial I}{\\partial u^{N + 1}}^{\\ast}\\\\\n", + " \\\\\n", + " \\frac{\\partial F}{\\partial u^{N + 1}}^* \\lambda^{N} &= \\lambda^{N+1}\n", + " \\\\\n", + " \\frac{\\partial F}{\\partial u^{N}}^* \\lambda^{N - 1} &= \\lambda^{N} \n", + " \\\\\n", + " \\vdots \\\\\n", + " \\frac{\\partial F}{\\partial u^{1}}^* \\lambda^{0} &= \\lambda^{1}\n", + "\\end{align*}\n", + "\\tag{5}\n", + "$$\n", + "where $\\lambda^{n}$ is the adjoint variable at time step $n$." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Burger's equation solver\n", + "We define a `BurgersSolver` class to compute the `forward` and `adjoint` solutions, and to handle storing of forward and adjoint data.\n", + "\n", + "The Burger's equation is discretised using the Finite Element Method (FEM). We use continuous first-order Lagrange basis functions to discretise the spatial domain. `BurgersSolver` uses the SymPy Python library to assemble the matrices from the weak form of the Burger's equation." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import os\n", + "import tempfile\n", + "from scipy.sparse.linalg import spsolve\n", + "from scipy.sparse import lil_matrix\n", + "import copy\n", + "from sympy import diff, integrate, symbols, lambdify\n", + "from sympy.matrices import SparseMatrix, Matrix\n", + "from checkpoint_schedules import StorageType\n", + "\n", + "\n", + "class BurgersSolver:\n", + " \"\"\"A solver for the forward and adjoint one dimensional Burger's equation.\n", + "\n", + " Parameters\n", + " ----------\n", + " model : dict\n", + " The model parameters containing the essential information to solve\n", + " the Burger's equation.\n", + " init_condition : array\n", + " The initial condition used to solve the forward Burger's equation.\n", + " mesh : array\n", + " The spatial mesh.\n", + " \"\"\"\n", + " def __init__(self, model, forward_initial_condition, mesh):\n", + " self.model = dict(model)\n", + " self.mesh = mesh\n", + " self.forward_work_memory = {StorageType.WORK: {}}\n", + " self.forward_work_memory[StorageType.WORK][0] = copy.deepcopy(forward_initial_condition)\n", + " self.forward_final_solution = None\n", + " self.initial_condition = copy.deepcopy(forward_initial_condition)\n", + " self.adjoint_work_memory = {StorageType.WORK: {}}\n", + " self.restart_forward = {StorageType.RAM: {}, StorageType.DISK: {}}\n", + " self.adjoint_dependency = {StorageType.WORK: {}, StorageType.RAM: {}, StorageType.DISK: {}}\n", + " self.mode = \"forward\"\n", + " self._initialize_matrices()\n", + " \n", + " def forward(self, n0, n1, storage=None, write_adj_deps=False, write_ics=False):\n", + " \"\"\"Advance the forward solver.\n", + "\n", + " Parameters\n", + " ----------\n", + " n0 : int\n", + " Initial time step.\n", + " n1 : int\n", + " Final time step.\n", + " storage : StorageType, optional\n", + " The storage type, which can be StorageType.RAM, StorageType.DISK,\n", + " StorageType.WORK, or StorageType.NONE.\n", + " write_adj_deps : bool, optional\n", + " Whether the adjoint dependency data will be stored.\n", + " write_ics : bool, optional\n", + " Whether the forward restart data will be stored.\n", + " \"\"\"\n", + " # Get the initial condition\n", + " u = self.forward_work_memory[StorageType.WORK].pop(n0)\n", + " if write_ics:\n", + " self._store_data(u, n0, storage, write_adj_deps, write_ics)\n", + " for step in range(n0, min(n1, self.model[\"max_n\"])):\n", + " if write_adj_deps:\n", + " self._store_data(u, step, storage, write_adj_deps, write_ics)\n", + " def _residual(u_new):\n", + " # C is a vector containing the nonlinear term.\n", + " C = np.array(self._C(*u_new))\n", + " residual = (self._M + self.model[\"dt\"] * self._K).dot(u_new) \\\n", + " + self.model[\"dt\"] * C - self._M.dot(u)\n", + " residual[0] = u_new[0]\n", + " residual[-1] = u_new[-1]\n", + " return residual\n", + " u_new = self._solve_newton(u, _residual)\n", + " u = copy.deepcopy(u_new)\n", + " step += 1\n", + " if step == self.model[\"max_n\"]:\n", + " self.forward_final_solution = copy.deepcopy(u_new)\n", + " if write_adj_deps:\n", + " self.adjoint_dependency[StorageType.WORK][step] = copy.deepcopy(u_new)\n", + " if (not write_adj_deps \n", + " or (self.mode == \"forward\" and step < (self.model[\"max_n\"]))\n", + " ):\n", + " self.forward_work_memory[StorageType.WORK][step] = copy.deepcopy(u_new)\n", + "\n", + " def adjoint(self, n0, n1, clear_adj_deps):\n", + " \"\"\"Advance the adjoint solver.\n", + "\n", + " Parameters\n", + " ---------\n", + " n0 : int\n", + " Initial time step.\n", + " n1 : int\n", + " Final time step.\n", + " clear_adj_deps : bool\n", + " If `True`, the adjoint dependency data will be cleared.\n", + " \"\"\"\n", + " self.mode = \"adjoint\"\n", + " if n1 == self.model[\"max_n\"]:\n", + " self._initialize_adjoint()\n", + " u_adj = self.adjoint_work_memory[StorageType.WORK].pop(n1)\n", + " M = copy.deepcopy(self._M)\n", + " M[0, 1] = M[-1, -2] = 0\n", + " M[0, 0] = M[-1, -1] = 1\n", + " for step in range(n1, n0, - 1):\n", + " J_T = self._jacobian(self.adjoint_dependency[StorageType.WORK][step]).T\n", + " u_adj_coef = spsolve(J_T, u_adj)\n", + " u_adj_coef[0] = u_adj_coef[-1] = 0\n", + " u_adj_new = M.T.dot(u_adj_coef)\n", + " u_adj = copy.deepcopy(u_adj_new)\n", + " if clear_adj_deps:\n", + " del self.adjoint_dependency[StorageType.WORK][step]\n", + " step -= 1\n", + " self.adjoint_work_memory[StorageType.WORK][step] = copy.deepcopy(u_adj_new)\n", + "\n", + " def gradient(self):\n", + " \"\"\"Compute the adjoint-based gradient.\n", + "\n", + " Returns\n", + " -------\n", + " array\n", + " The adjoint-based gradient.\n", + " \"\"\"\n", + " return self.adjoint_work_memory[StorageType.WORK][0]\n", + " \n", + " def functional(self):\n", + " \"\"\"Compute the functional.\n", + " \"\"\"\n", + " h = self.model[\"lx\"] / (self.model[\"nx\"] - 1)\n", + " return 0.5 * np.sum(self.forward_final_solution ** 2) * h\n", + " \n", + " def copy_data(self, step, from_storage, to_storage, move=False):\n", + " \"\"\"Copy data from one storage to another.\n", + "\n", + " Parameters\n", + " ----------\n", + " step : int\n", + " The time step.\n", + " from_storage : StorageType\n", + " The storage type from which the data will be copied.\n", + " to_storage : StorageType\n", + " The storage type to which the data will be copied.\n", + " move : bool, optional\n", + " Whether the data will be moved or not. If `True`, the data will be\n", + " removed from the `from_storage`.\n", + " \"\"\"\n", + " if from_storage == StorageType.DISK:\n", + " if step in self.adjoint_dependency[StorageType.DISK]:\n", + " file_name = self.adjoint_dependency[StorageType.DISK][step]\n", + " with open(file_name, \"rb\") as f:\n", + " self.adjoint_dependency[to_storage][step] = np.load(f)\n", + " if step in self.restart_forward[StorageType.DISK]:\n", + " file_name = self.restart_forward[StorageType.DISK][step]\n", + " with open(file_name, \"rb\") as f:\n", + " self.forward_work_memory[to_storage][step] = np.load(f)\n", + " if move:\n", + " os.remove(file_name)\n", + " elif from_storage == StorageType.RAM:\n", + " self.forward_work_memory[to_storage][step] = \\\n", + " copy.deepcopy(self.restart_forward[from_storage][step])\n", + " if move:\n", + " if step in self.adjoint_dependency[from_storage]:\n", + " del self.adjoint_dependency[from_storage][step]\n", + " if step in self.restart_forward[from_storage]:\n", + " del self.restart_forward[from_storage][step]\n", + " else:\n", + " raise ValueError(\"This `StorageType` is not supported.\")\n", + "\n", + " def _initialize_adjoint(self):\n", + " self.adjoint_work_memory[StorageType.WORK][self.model[\"max_n\"]] = self._functional_derivative()\n", + "\n", + " def _functional_derivative(self):\n", + " phi = self._basis_function()\n", + " # v = φ_0 + φ_1.\n", + " v = phi[0] + phi[1] # Test function.\n", + " h = self.model[\"lx\"] / (self.model[\"nx\"] - 1)\n", + " dI_du = np.zeros(self.model[\"nx\"])\n", + " for i in range(1, self.model[\"nx\"] - 1):\n", + " # dI_du = h * ∫ u * v d η, η = 0 to 1.\n", + " dI_du[i] = h * integrate(self.forward_final_solution[i] * v, (\"eta\", 0, 1)) \n", + " return dI_du\n", + " \n", + " def _initialize_matrices(self):\n", + " # Initialize the mass and stiffness matrices.\n", + " self._M = self._mass_matrix()\n", + " self._K = self._stiffness_matrix()\n", + " u = symbols(\"u_:{0}\".format(len(self.initial_condition)))\n", + " # Initialize the advection matrix.\n", + " self._C = lambdify(u, self._advection_matrix_action(u)[:], 'numpy')\n", + " # Initialize the Jacobian of the advection matrix.\n", + " self._J_C = lambdify(u, self._jacobian_advection(u), 'numpy')\n", + "\n", + " def _jacobian(self, u):\n", + " C_J = lil_matrix(self._J_C(*u)) # The Jacobian of the advection matrix.\n", + " Jac = self._M + self.model[\"dt\"] * (self._K + C_J)\n", + " Jac[0, 1] = Jac[-1, -2] = 0\n", + " Jac[0, 0] = Jac[-1, -1] = 1\n", + " return Jac\n", + " \n", + " def _solve_newton(self, u_prev, residual, tol=1e-8, max_iter=50):\n", + " i = 0\n", + " u = copy.deepcopy(u_prev)\n", + " u[0] = u[-1] = 0\n", + " while i < max_iter:\n", + " Jac = self._jacobian(u)\n", + " delta_u = spsolve(Jac, residual(u))\n", + " u_new = u - delta_u\n", + " u = copy.deepcopy(u_new)\n", + " i += 1\n", + " if np.linalg.norm(delta_u) < tol:\n", + " break\n", + " if i == max_iter:\n", + " print(\"Newton's method did not converge in {} iterations.\".format(max_iter))\n", + " return u_new\n", + " \n", + " def _basis_function(self):\n", + " eta = symbols(\"eta\")\n", + " # φ_0 = 1 - η, φ_1 = η.\n", + " # φ_0 and φ_1 are the basis functions.\n", + " # η = (x - x0) / h, where h is the mesh spacing.\n", + " return [1 - eta, eta]\n", + " \n", + " def _mass_matrix(self):\n", + " h = self.model[\"lx\"] / (self.model[\"nx\"] - 1) \n", + " phi = self._basis_function()\n", + " M_local = np.zeros((2, 2))\n", + " for i in range(2):\n", + " for j in range(2):\n", + " # M_(i,j) = h * ∫ φ_i * φ_j dη, η = 0 to 1. \n", + " M_local[i, j] = h * integrate(phi[i] * phi[j], (\"eta\", 0, 1))\n", + " return self._assemble_global_matrix(M_local)\n", + "\n", + " def _assemble_global_matrix(self, local_matrix):\n", + " num_nodes = self.model[\"nx\"]\n", + " global_matrix = lil_matrix((num_nodes, num_nodes))\n", + " global_matrix[0, 0] = local_matrix[0, 0]\n", + " global_matrix[num_nodes - 1, num_nodes - 1] = local_matrix[1, 1]\n", + " global_matrix[0, 1] = local_matrix[0, 1]\n", + " global_matrix[num_nodes - 1, num_nodes - 2] = local_matrix[1, 0]\n", + " for i in range(1, num_nodes - 1):\n", + " global_matrix[i, i - 1] = local_matrix[1, 0]\n", + " global_matrix[i, i + 1] = local_matrix[0, 1]\n", + " global_matrix[i, i] = local_matrix[1, 1] + local_matrix[0, 0]\n", + " return global_matrix\n", + "\n", + " def _stiffness_matrix(self):\n", + " # 1D mesh is uniform. Thus, the mesh spacing is constant.\n", + " h = self.model[\"lx\"] / (self.model[\"nx\"] - 1) \n", + " b = self.model[\"nu\"] / h\n", + " phi = self._basis_function()\n", + " dphi_deta = [diff(phi[i], \"eta\") for i in range(2)]\n", + " K_local = np.zeros((2, 2))\n", + " for i in range(2):\n", + " for j in range(2):\n", + " # K_(i,j) = μ/h * ∫ dφ_i/dη * dφ_j/dη dη, η = 0 to 1.\n", + " K_local[i, j] = b * integrate(dphi_deta[i] * dphi_deta[j], (\"eta\", 0, 1))\n", + " return self._assemble_global_matrix(K_local)\n", + " \n", + " def _advection_matrix_action(self, u):\n", + " num_nodes = self.model[\"nx\"]\n", + " # u0 and u1 are the coefficients of the basis functions at the nodes.\n", + " u0, u1 = symbols(\"u0 u1\") \n", + " coefficients = Matrix([u0, u1]) \n", + " phi = self._basis_function()\n", + " u_ = phi[0] * u0 + phi[1] * u1\n", + " dphi_deta = [diff(phi[i], \"eta\") for i in range(2)]\n", + " # nonlinear_term = u * du/dη, where u = φ_0 * u0 + φ_1 * u1 \n", + " # and du/dη = dφ_0/dη * u0 + dφ_1/dη * u1.\n", + " nonlinear_term = u_ * coefficients.dot(dphi_deta)\n", + " # local_vector_i = ∫ (u * du/dη) * φ_i dη, η = 0 to 1.\n", + " local_vector = Matrix([integrate(nonlinear_term * phi[i], (\"eta\", 0, 1)) for i in range(2)])\n", + " local_vector_func = lambdify((u0, u1), local_vector, 'numpy')\n", + " C = Matrix.zeros(1, num_nodes)\n", + " C[0] = local_vector_func(u[0], u[1])[0]\n", + " C[num_nodes - 1] = local_vector_func(u[num_nodes - 2], u[num_nodes - 1])[1]\n", + " for i in range(1, num_nodes - 1):\n", + " C[i] = local_vector_func(u[i], u[i + 1])[0] + local_vector_func(u[i - 1], u[i])[1]\n", + " return C\n", + " \n", + " def _jacobian_advection(self, u):\n", + " num_nodes = self.model[\"nx\"]\n", + " # u0 and u1 are the coefficients of the basis functions at the nodes.\n", + " u0, u1 = symbols(\"u0 u1\")\n", + " phi = self._basis_function()\n", + " coefficients = Matrix([u0, u1]) \n", + " dphi_deta = [diff(phi[i], \"eta\") for i in range(2)]\n", + " # u_ = φ_0 * u0 + φ_1 * u1.\n", + " u_ = phi[0] * u0 + phi[1] * u1\n", + " Jc = SparseMatrix.zeros(num_nodes, num_nodes)\n", + " local_matrix = Matrix.zeros(2, 2)\n", + " for i in range(2):\n", + " for j in range(2):\n", + " # linearised term_0 = φ_j * du/dη, where du/dη = dφ_0/dη * u0 + dφ_1/dη * u1.\n", + " term_0 = phi[j] * coefficients.dot(dphi_deta)\n", + " # linearised term_1 = u_ * dφ_j/dη where u_ = φ_0 * u0 + φ_1 * u1.\n", + " term_1 = u_ * dphi_deta[j]\n", + " # local_matrix_(i,j) = ∫ (φ_j * du/dη + u_ * dφ_j/dη) * φ_i dη, η = 0 to 1.\n", + " local_matrix[i, j] = integrate((term_0 + term_1) * phi[i], (\"eta\", 0, 1))\n", + " local_matrix_func = lambdify((u0, u1), local_matrix, 'numpy')\n", + " Jc[0, 0] = local_matrix_func(u[0], u[1])[0, 0]\n", + " Jc[0, 1] = local_matrix_func(u[0], u[1])[0, 1]\n", + " Jc[num_nodes - 1, num_nodes - 2] = local_matrix_func(u[num_nodes - 2], u[num_nodes - 1])[1, 0]\n", + " Jc[num_nodes - 1, num_nodes - 1] = local_matrix_func(u[num_nodes - 2], u[num_nodes - 1])[1, 1]\n", + " for i in range(1, num_nodes - 1):\n", + " Jc[i, i - 1] = local_matrix_func(u[i - 1], u[i])[1, 0]\n", + " Jc[i, i] = (local_matrix_func(u[i - 1], u[i])[1, 1] \n", + " + local_matrix_func(u[i], u[i + 1])[0, 0])\n", + " Jc[i, i + 1] = local_matrix_func(u[i], u[i + 1])[0, 1]\n", + " return Jc\n", + "\n", + " def _store_data(self, data, step, storage, write_adj_deps, write_ics):\n", + " if storage == StorageType.DISK:\n", + " self._store_on_disk(data, step, write_adj_deps)\n", + " elif write_adj_deps:\n", + " self.adjoint_dependency[storage][step] = copy.deepcopy(data)\n", + " elif write_ics:\n", + " self.restart_forward[storage][step] = copy.deepcopy(data)\n", + "\n", + " def _store_on_disk(self, data, step, adj_deps):\n", + " if adj_deps:\n", + " tmp_adj_deps_directory = tempfile.gettempdir()\n", + " file_name = os.path.join(tmp_adj_deps_directory, \"fwd_\" + str(step) + \".npy\")\n", + " self.adjoint_dependency[StorageType.DISK][step] = file_name\n", + " np.save(file_name, data)\n", + " else:\n", + " tmp_rest_forward_directory = tempfile.gettempdir()\n", + " file_name = os.path.join(tmp_rest_forward_directory, \"fwd_\" + str(step) + \".npy\")\n", + " self.restart_forward[StorageType.DISK][step] = file_name\n", + " np.save(file_name, data)\n", + " \n", + " def _clean_disk(self):\n", + " if len(self.adjoint_dependency[StorageType.DISK]) > 0:\n", + " for step in self.adjoint_dependency[StorageType.DISK]:\n", + " self.adjoint_dependency[StorageType.DISK][step].close()\n", + " if len(self.restart_forward[StorageType.DISK]) > 0:\n", + " for step in self.restart_forward[StorageType.DISK]:\n", + " self.restart_forward[StorageType.DISK][step].close()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Checkpointing\n", + "The adjoint PDE is then solved in a reverse time order and it depends of the forward data. Storing the entire forward solution uses storage linear linear in the number of time steps, which can exhaust the available storage. To overcome this kind of problem, checkpointing algorithms are used to reduce the memory usage.\n", + "\n", + "To employ a checkpointing method in the adjoint-based sensitivity calculation, we define a `CheckpointingManager` to manage the execution of forward and adjoint models. The `CheckpointingManager` defines the `execute` method, which performs each action specified in a provided checkpoint schedule (`_schedule`). Within the `execute` method, we have the single-dispatch generic `action` function that is overloaded by particular functions. For instance, if the `checkpoint_schedule` action is `Forward`, the `action_forward` function is called, and this can advance the forward calculation. In this particular example, the forward solver is executed by calling `self.solver.forward`. Here, `self.solver` is an attribute of `CheckpointingManager`. Similarly, the adjoint solver is executed by calling `self.solver.adjoint` within the `action_reverse` function.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import functools, sys\n", + "from checkpoint_schedules import Forward, Reverse, Copy, Move, EndForward, EndReverse\n", + "class CheckpointingManager:\n", + " \"\"\"Manage the forward and adjoint solvers.\n", + "\n", + " Attributes\n", + " ----------\n", + " schedule : CheckpointSchedule\n", + " The schedule created by `checkpoint_schedules` package.\n", + " solver : object\n", + " A solver object used to solve the forward and adjoint solvers.\n", + " \n", + " Notes\n", + " -----\n", + " The `solver` object contains methods to execute the forward and adjoint. In \n", + " addition, it contains methods to copy data from one storage to another, and\n", + " to set the initial condition for the adjoint.\n", + " \"\"\"\n", + " def __init__(self, schedule, solver):\n", + " self.max_n = sys.maxsize\n", + " self.solver = solver\n", + " self.reverse_step = 0\n", + " self._schedule = schedule\n", + " \n", + " def execute(self):\n", + " \"\"\"Execute forward and adjoint using checkpointing.\n", + " \"\"\"\n", + " @functools.singledispatch\n", + " def action(cp_action):\n", + " raise TypeError(\"Unexpected action\")\n", + "\n", + " @action.register(Forward)\n", + " def action_forward(cp_action):\n", + " n1 = cp_action.n1\n", + " self.solver.forward(cp_action.n0, n1, storage=cp_action.storage,\n", + " write_adj_deps=cp_action.write_adj_deps,\n", + " write_ics=cp_action.write_ics)\n", + " if n1 >= self.solver.model[\"max_n\"]:\n", + " n1 = min(n1, self.solver.model[\"max_n\"])\n", + " self._schedule.finalize(n1)\n", + "\n", + " @action.register(Reverse)\n", + " def action_reverse(cp_action):\n", + " self.solver.adjoint(cp_action.n0, cp_action.n1, cp_action.clear_adj_deps)\n", + " self.reverse_step += cp_action.n1 - cp_action.n0\n", + " \n", + " @action.register(Copy)\n", + " def action_copy(cp_action):\n", + " self.solver.copy_data(cp_action.n, cp_action.from_storage,\n", + " cp_action.to_storage, move=False)\n", + "\n", + " @action.register(Move)\n", + " def action_move(cp_action):\n", + " self.solver.copy_data(cp_action.n, cp_action.from_storage,\n", + " cp_action.to_storage, move=True)\n", + " \n", + " @action.register(EndForward)\n", + " def action_end_forward(cp_action):\n", + " if self._schedule.max_n is None:\n", + " self._schedule._max_n = self.max_n\n", + " \n", + " @action.register(EndReverse)\n", + " def action_end_reverse(cp_action):\n", + " self.solver._clean_disk()\n", + " if self._schedule.max_n != self.reverse_step:\n", + " raise ValueError(\"The number of steps in the reverse phase\"\n", + " \"is different from the number of steps in the\"\n", + " \"forward phase.\")\n", + " \n", + " self.reverse_step = 0\n", + " for _, cp_action in enumerate(self._schedule):\n", + " action(cp_action)\n", + " if isinstance(cp_action, EndReverse):\n", + " break" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Adjoint-based sensitivity computations\n", + "\n", + "The purpose of this adjoint-based sensitivity computation is to use different checkpointing approaches available in the `checkpoint_schedules` package and verify the consistency of the results. We employ a fundamental tool used in verification of gradients, which is the\n", + "*second order Taylor remainder convergence test*. Thus, let us consider the functional $I(u_0)$ and let $\\nabla_{u_0}$ be its gradient with respect to the control parameter $u_0$. Let $u$ be the solution of the forward problem, and let $\\delta u$ be a perturbation to $u$. Then the *Taylor remainder convergence test* is based on the expression: \n", + "$$ \\left|I(u + \\epsilon \\delta u) - I(u) - \\nabla_{u_0} I \\cdot \\epsilon \\delta u\\right| \\rightarrow 0 \\quad \\mathrm{at} \\ O(\\epsilon^2).$$\n", + "\n", + "In the next sections, we present the results of the adjoint-based sensitivity computation using the `checkpoint_schedules` package and the *Taylor remainder convergence test*. It is expected the convergence rate of the *Taylor remainder convergence test* is approximately $2$ in every checkpointing approach.\n", + "\n", + "Below, we define the `model` dictionary containing the parameters required for the forward and adjoint solvers. The `model` dictionary is then passed to the `BurgersSolver` class. Additionally, we set up the 1D mesh and the initial condition for the forward Burgers' solver." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "model = {\"lx\": 1, # domain length\n", + " \"nx\": 80, # number of nodes\n", + " \"dt\": 0.001, # time step\n", + " \"nu\": 0.01, # viscosity\n", + " \"max_n\": 200, # total steps\n", + " \"chk_ram\": 50, # number of checkpoints in RAM\n", + " \"chk_disk\": 50, # number of checkpoints on disk\n", + " }\n", + "mesh = np.linspace(0, model[\"lx\"], model[\"nx\"]) # create the spatial grid\n", + "u0 = np.sin(np.pi*mesh) # initial condition" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "def taylor_remainder_test(gradient, J, u0, h):\n", + " epsilons = [0.001 / 2**i for i in range(4)]\n", + " E = []\n", + " h0 = model[\"lx\"] / (model[\"nx\"] - 1)\n", + " dJdm = np.dot(gradient, h)\n", + " for eps in epsilons:\n", + " up = u0 + eps*h\n", + " burgers = BurgersSolver(model, up, mesh)\n", + " burgers.forward(0, model[\"max_n\"])\n", + " Jp = burgers.functional()\n", + " E.append(abs(Jp - J - dJdm * eps))\n", + " print(\"Computed residuals: {}\".format(E))\n", + " return E, epsilons\n", + "\n", + "def convergence_rates(E_values, eps_values, show=True):\n", + " r = []\n", + " for i in range(1, len(eps_values)):\n", + " r.append(np.log(E_values[i] / E_values[i - 1])\n", + " / np.log(eps_values[i] / eps_values[i - 1]))\n", + " if show:\n", + " print(\"Computed convergence rates: {}\".format(r))\n", + " return r" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We initially compute the adjoint-based gradient using `SingleMemoryStorageSchedule` checkpointing approach. The `SingleMemoryStorageSchedule` stores the forward data of each time-step in working memory. As explained in the [notebook with illustrative example](https://nbviewer.org/github/firedrakeproject/checkpoint_schedules/blob/main/docs/notebooks/tutorial.ipynb), this schedule does not require the maximal step (`model[\"max_n\"]`)." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Computed residuals: [3.9314210254935353e-07]\n", + "Computed residuals: [3.9314210254935353e-07, 9.829555822570522e-08]\n", + "Computed residuals: [3.9314210254935353e-07, 9.829555822570522e-08, 2.4575142825995593e-08]\n", + "Computed residuals: [3.9314210254935353e-07, 9.829555822570522e-08, 2.4575142825995593e-08, 6.143942141015931e-09]\n", + "Computed convergence rates: [1.9998527430492066, 1.9999264243585173, 1.9999632662271236]\n" + ] + }, + { + "data": { + "text/plain": [ + "[1.9998527430492066, 1.9999264243585173, 1.9999632662271236]" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from checkpoint_schedules import SingleMemoryStorageSchedule\n", + "schedule = SingleMemoryStorageSchedule()\n", + "burger = BurgersSolver(model, u0, mesh)\n", + "manager = CheckpointingManager(schedule, burger) # Create the checkpointing manager.\n", + "manager.execute() # execute the checkpointing schedule using `SingleMemoryStorageSchedule` schedule.\n", + "delta_u = np.ones(model[\"nx\"])\n", + "E, epsilon = taylor_remainder_test(burger.gradient(), burger.functional(), u0, delta_u)\n", + "convergence_rates(E, epsilon)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The following examples use `Revolve` [2] and `HRevolve` schedules [3]. The `Revolve` algorithm requires the definition of the maximal step `model[\"max_n\"]` before the execution of the forward solver, and also the specification of the number of checkpoints stored in memory (`model[\"max_n\"]`). Whereas `Revolve` allows only the storage in memory, the `HRevolve` algorithm allows both disk and memory storage. The values `model[\"max_n\"]`, `model[\"chk_ram\"]` and `model[\"chk_dins\"]` are required to define the `HRevolve` schedule." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Computed residuals: [3.9314210254935353e-07]\n", + "Computed residuals: [3.9314210254935353e-07, 9.829555822570522e-08]\n", + "Computed residuals: [3.9314210254935353e-07, 9.829555822570522e-08, 2.4575142825995593e-08]\n", + "Computed residuals: [3.9314210254935353e-07, 9.829555822570522e-08, 2.4575142825995593e-08, 6.143942141015931e-09]\n", + "Computed convergence rates: [1.9998527430492066, 1.9999264243585173, 1.9999632662271236]\n" + ] + }, + { + "data": { + "text/plain": [ + "[1.9998527430492066, 1.9999264243585173, 1.9999632662271236]" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from checkpoint_schedules import Revolve\n", + "burger = BurgersSolver(model, u0, mesh)\n", + "schedule = Revolve(model[\"max_n\"], model[\"chk_ram\"])\n", + "manager = CheckpointingManager(schedule, burger)\n", + "manager.execute()\n", + "E, epsilon = taylor_remainder_test(burger.gradient(), burger.functional(), u0, delta_u)\n", + "convergence_rates(E, epsilon)" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Computed residuals: [3.9314210254935353e-07]\n", + "Computed residuals: [3.9314210254935353e-07, 9.829555822570522e-08]\n", + "Computed residuals: [3.9314210254935353e-07, 9.829555822570522e-08, 2.4575142825995593e-08]\n", + "Computed residuals: [3.9314210254935353e-07, 9.829555822570522e-08, 2.4575142825995593e-08, 6.143942141015931e-09]\n", + "Computed convergence rates: [1.9998527430492066, 1.9999264243585173, 1.9999632662271236]\n" + ] + }, + { + "data": { + "text/plain": [ + "[1.9998527430492066, 1.9999264243585173, 1.9999632662271236]" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from checkpoint_schedules import HRevolve\n", + "burger = BurgersSolver(model, u0, mesh)\n", + "schedule = HRevolve(model[\"max_n\"], model[\"chk_ram\"], model[\"chk_disk\"])\n", + "manager = CheckpointingManager(schedule, burger)\n", + "manager.execute()\n", + "E, epsilon = taylor_remainder_test(burger.gradient(), burger.functional(), u0, delta_u)\n", + "convergence_rates(E, epsilon)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Feel free to explore alternative schedules provided by the `checkpoint_schedules` package. Our [notebook with illustrative example](https://nbviewer.org/github/firedrakeproject/checkpoint_schedules/blob/main/docs/notebooks/tutorial.ipynb) offers a demonstration of their usage. You need follow the steps outlined in the above code: instantiate a `BurgersSolver` object, define the `schedule`, create a `CheckpointingManager` object, and then execute the solvers using the `execute` method." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### References\n", + "\n", + "1. Gunzburger, Max D. Perspectives in flow control and optimization. Society for Industrial and Applied Mathematics, 2002. doi: https://doi.org/10.1137/1.9780898718720\n", + "2. Griewank, A., & Walther, A. (2000). Algorithm 799: revolve: an implementation of checkpointing for the reverse or adjoint mode of computational differentiation. ACM Transactions on Mathematical Software (TOMS), 26(1), 19-45., doi: https://doi.org/10.1145/347837.347846.\n", + "3. Herrmann, J. and Pallez (Aupy), G. (2020). H-Revolve: a framework for adjoint computation on synchronous hierarchical platforms. ACM Transactions on Mathematical Software (TOMS), 46(2), 1-25. DOI: https://doi.org/10.1145/3378672.\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.7" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/docs/notebooks/tutorial.ipynb b/docs/notebooks/tutorial.ipynb index 0e5c787..061ed40 100644 --- a/docs/notebooks/tutorial.ipynb +++ b/docs/notebooks/tutorial.ipynb @@ -786,7 +786,7 @@ "source": [ "### Two-level binomial \n", "\n", - "Two-level binomial schedule was presented in reference [6], and its application was performed in the work [7]. \n", + "Two-level binomial schedule was presented in reference [7], and its application was performed in the work [8]. \n", "\n", "The two-level binomial checkpointing stores the forward restart data based on the user-defined `period`. In this schedule, we can define the limit for additional storage of the forward restart data during the step advancing of the adjoint model. The default storage type is `'disk'`.\n", "\n", @@ -884,7 +884,7 @@ "metadata": {}, "source": [ "## Final remarks\n", - "This notebook focused on a visual illustration of the employment of the schedules available in the *checkpointing_schedule* package. The specific function (e.g. `action_forward`) proposed to illustrate the step execution of the forward and adjoint models. However, the user can implement the necessary code to the step execution of the forward and adjoint models and the copy and move codes instead of the illustrative code." + "This notebook focused on a visual illustration of the employment of the schedules available in the *checkpointing_schedule* package. The specific function (e.g. `action_forward`) proposed to illustrate the step execution of the forward and adjoint models. However, the user can implement the necessary code to the step execution of the forward and adjoint models and the copy and move codes instead of the illustrative code. An adjoint-based gradient computation wiht the checkpointing schedules is presented in the [notebook](https://nbviewer.org/github/firedrakeproject/checkpoint_schedules/blob/dolci/schedules_application/docs/notebooks/burger.ipynb)." ] }, { diff --git a/docs/source/index.rst b/docs/source/index.rst index 0fae71d..d83a83c 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -20,6 +20,7 @@ If you want to quickly get up and running with *checkpoint_schedules* your adjoi * Familiarise with *checkpoint_schedules* by acessing the :ref:`introduction `. * Go through to an `illustrative example `_ that explains the usage of *checkpoint_schedules* for step-based incremental checkpointing of the adjoints. +* See `here `_ for an adjoint problem solved using *checkpoint_schedules*. API documentation =================