Skip to content

2. Biologically‐Informed Neural Networks (BINNs)

austin edited this page Aug 21, 2023 · 2 revisions

Biologically-Informed Neural Networks (BINNs)

What are BINNs?

  Biologically-Informed Neural Networks (BINNs) are a method of equation learning that learn the components of a governing dynamical system while simultaneously converging to an approximate solution. They provide insight into the underlying mechanisms of a natural system by leveraging the universal approximator property of neural networks to estimate nonlinear components and minimize a priori assumptions about their form. We refer to the learned components as "parameter networks" since they are trained neural networks that are embedded in the BINN that approximate an unknown parameter that models the ODE system. These neural networks are able to approximate a large number of functions, linear and nonlinear, but they lack interpretability. Using the learned parameter networks, we are able to make expert guided inferences about the underlying equations of the learned components a posterior.

  Because neural networks are comprised of layers of activation functions that we (in almost all cases) know the analytic derivative of, we can easily get the analytic derivative of the output $\hat{\Psi}(\Theta)$ with respect to the input $\Theta$. This is referred to as automatic differentiation and neural networks use an automatic differentiation algorithm called back propagation in order to do this. We are able to leverage automatic differentiation to get the analytic derivatives of the estimated solutions and components for the loss function.

Our BINN

The Governing Dynamical System

  It is important to understand first what we are training to. The neural networks seek to minimize a cost function that is an estimate of the expected value of loss across all samples of data. This is very broadly true for many neural networks, but what makes theory-informed neural networks distinct is what defines loss. These neural networks are called theory-informed because we utilize a priori assumptions to inform the neural networks exactly what governing dynamical system they should be converging to. That is, they should be minimizing loss between the output of the network and an assumed governing dynamical system. In our case, this governing dynamical system is a system of ODEs derived from a compartmental model of disease spread. For PINNs, this may be a system of partial differential equations that was derived from laws of thermodynamics. Regardless, what is important is to understand what this governing dynamical system is and how the loss function is defined to train the neural network to converge to solutions that satisfy such system (or get as close as possible).

  We denote the states/solutions of the dynamical system as $u\in\mathbb{R}^d$, where $d$ is the number of dimensions (differential equations) of our mathematical model. In our case, the system of differential equations we are interested are ordinary. That is, we are interested in describing $\frac{du}{dt}$. From our compartmental model, we assume $$\frac{du}{dt} = g(u)$$ where $g$ is some real valued function of $u$. $g(u)$ has parameters inside of it that we wish to accurately describe in order to obtain an accurate mathematical model of our ABM. Hence, our governing dynamical system is the system of equations described by $\frac{du}{dt} = g(u)$.

The Networks

  • Contact Rate ($\eta$) Network: Neural network that estimates the contact rate, $\eta$, as a function of $S,A,Y$ and $M$ if and only if the model and data including masking.

  • Tracing Rate ($\beta$) Network: Neural network that estimates the tracing rate, $\beta$, as a function of $S+A+Y$ and $\chi(t)$.

  • Quarantine Diagnosis ($\tau$) Rate Network: Neural network that estimates the quarantine diagnosis rate, $\tau$, as a function of $A,Y$.

The Loss Function

The loss function is defined as

$$L_{Total} = L_{GLS} + L_{PDE} + L_{Constr.}$$

In our application, we combine $L_{PDE}$ and $L_{Constr.}$ into one term. Formally, we treat these components separately.

  • $L_{GLS}$: Generalized Least Squares Loss.

  • $L_{PDE}$: Residual Sum of Squares between the left hand side of the differential equations and the right hand side. $$L_{PDE} = \sum_{i=1}^N \left[(\frac{d\hat{u}}{dt})_i - g(\hat{u})_i\right]^2$$

  • $L_{Constr.}$: Residual Sum of Squares of values outside of desired interval.

3 Different Models

  • No Masking: This BINN model does not include masking. That is, masking averages are not included in the dataset and they are not included as input into the contact rate, $\eta$, function.

  • Observed Masking: This BINN model does include masking. It includes masking averages as observed data in the dataset and as input into the contact rate, $\eta$, function.

  • Learned Masking: This BINN model does include masking. It includes masking averages as observed data in the dataset. It outputs an estimate masking average, $\hat M$, and inputs that into the contact rate, $\eta$, function. The system of ODEs in this case has $\frac{d\hat{M}}{dt}$ as an added component.