Skip to content

A repository to store Python code for Covid-19 modelling work undertaken in the School of Maths at the University of Nottingham, UK.

License

Notifications You must be signed in to change notification settings

DGWhittaker/nottingham_covid_modelling

Repository files navigation

SItD model which uses Google mobility data

This folder contains a python implementation of a SItD (susceptible-infected-deceased) compartmental epidemiological model structured by infectious age. The code herein is associated with the paper:

"Uncertainty and error in SARS-CoV-2 epidemiological parameters inferred from population-level epidemic models". Whittaker, D. G.*, Herrera-Reyes, A. D.*, Hendrix, M., Owen, M. R., Band, L. R., Mirams, G. R., Bolton, K. J., Preston, S. P. J Theor Biol, 2023, 111337.

* joint first authors

Code overview

Installation

We strongly recommend installing and running the scripts in a virtual environment to avoid version conflicts and make sure all paths work as expected. In order to do this, follow these steps:

  • virtualenv folder_name or if you have both python 2 and 3: virtualenv --python=python3 folder_name. Should virtualenv not be recognised you may need to call it as python -m virtualenv folder_name. If that doesn't work you may need to install virtualenv first pip install virtualenv.
  • go into the virtual environment cd folder_name folder
  • and activate source ./bin/activate (or Scripts\activate on Windows)
  • now get the source code from git: git clone https://github.com/DGWhittaker/nottingham_covid_modelling.git
  • go to repository folder and install the required packages by typing pip install -r requirements.txt
  • install PINTS separately by typing pip install pints
  • Now pip install -e . to install the nottingham_covid_modelling package itself

Running SItD model

  • Now you have a number of different scripts available, which can be run from any directory once the nottingham_covid_modelling package has been installed. Each of these can be run with the -h flag to get more information about available command line arguments.

To run the basic forward model without Google mobility data, type:

forward_model.

  • In order to find good parameters of the Google mobility data forward model, type:

find_good_model -c "country_str" where "country_str" is one of {Australia,Canada,France,Germany,India,Italy,New Zealand,United Kingdom,United States,South Korea,Spain,Sweden}.

  • In order to optimise parameters of the Google mobility data forward model to UK death data using different noise models (default is negative binomial), type:

optimise_model.

  • Following this, once best fits from CMA-ES have been generated, run:

mcmc (takes a while, so using a multi-core server recommended). Note that this can ONLY be run after optimise_model, as best fits from CMA-ES are used as inputs.

  • To see diagnostic plots from the MCMC, type:

plot_mcmc or plot_mcmc_series. For all of the above, the -o argument can again be added to use UK ONS deaths data.

The data used are processed from the ONS website.

Please Note: If your folder names have spaces this might cause issues depending on the platform you're using. e.g. users/user_name/OneDrive - The University of Nottingham if things don't run check you don't have spaces in your folder names.

Running simpler models vs SItD model

  • Our code is designed to mimic the results from the SItD model using any of our simpler models to synthetic data similar to the trued death data of the first wave of infection at England and Gales.

  • To find the MAPs for all parameters of a particular model, type:

optimise_likelihood_anymodel --model_name "model_str" --informative_priors where "model_str" is one of {SIR, SIRDeltaD, SIUR, SEIUR}.

You can control which parameters to fit using the option -partial -pto "parameters_str". The possible parameters depend on the selected model and are listed in the commands help (optimise_likelihood_anymodel -h).

  • To generate the MCMC chains, run then:

mcmc_anymodel "model_and_paramter_options" where "model_and_paramter_options" are the model and parameter specifications previously run with optimise_likelihood_anymodel

  • To generate diagnostic plots for the mcmc, run:

plot_mcmc_anymodel "model_and_paramter_options_on_mcmc" where "model_and_paramter_options_on_mcmc" are the model and parameter specifications previously run with mcmc_anymodel

Folder structure

  • The nottingham_covid_modelling/lib folder contains modules used by the forward model: data.py retrieves stored Google mobility data and daily deaths, equations.py solves the SItD model difference equations, error_measures.py contains functions to calculate different error measures, likelihood.py and priors.py contain log-likelihood functions and priors, respectively, for different noise models, and settings.py contains default fixed parameter settings for the model.
  • Google mobility data are retrieved from here.
  • GOV.UK daily death data are retrieved from here.
  • Automatic tests for our code are in nottingham_covid_modelling/tests

Contributing new scripts

The way it works is:

  • You make a script as per usual
  • Except you put all code (apart from the imports) in a function
  • In setup.py in the python folder you see entry_points={ 'console_scripts': [ ... here the entry point for all console scripts are defined. e.g. The follow means the rommand run_me will wil run my_function from my_script_file.py in the nottingham_covid_modelling folder (package) 'run_me=nottingham_covid_modelling.my_script_file:my_function'
  • You may need to uninstall the module pip uninstall nottingham_covid_modelling and re-install it pip install -e . in the python/ directory for this change to take effect

Running tests

To run the tests:

  • Make sure you have the virtual environment activated (see above).
  • Navigate to the source folder inside the virtual environment (i.e. covid19-sp)
  • Run pytest

To run a specific test: pytest e.g. (covid) C:\Users\uczmh2\Desktop\covid\covid19-sp>pytest python\nottingham_covid_modelling\tests\test_csv_utils.py

Please Note: you can actually be anywhere inside the source folder, but avoid being in the main virtual environment folder (in the above example that would be C:\Users\uczmh2\Desktop\covid) as this would make pytest try to run tests of the various other installed python modules. This is both a waste of time and it will probably fail (some modules require additional packages to be installed for the tests to work).

Reproducing paper results

Running simulations

The results for Figure 4-7 rely on first obtaining an estimate of the maximum a posteriori probability (MAP) using CMA-ES optimisation, then using this as a starting point for Bayesian inference using MCMC. In each figure, that process will be explained first.

Figure 3

In order to generate the data used in Figure 3, type the following:

  • optimise_model --optimise_likelihood --ons_data --square --params_to_optimise rho Iinit1 lockdown_baseline lockdown_offset IFR for the step change in infectivity case
  • optimise_model --optimise_likelihood --ons_data --square --alpha1 --params_to_optimise rho Iinit1 IFR for the constant infectivity (herd immunity) case.

These commands will run 5 repeats of the CMA-ES optimisation routine by default, and deposit the results in nottingham_covid_modelling/cmaes_fits. To quickly visualise SItD model output using the found parameters, simply type:

  • plot_cma_fit --plot_fit --optimise_likelihood --ons_data --square --params_to_optimise rho Iinit1 lockdown_baseline lockdown_offset IFR for the step change in infectivity case
  • plot_cma_fit --plot_fit --optimise_likelihood --ons_data --square --alpha1 -pto rho Iinit1 IFR for the constant infectivity (herd immunity) case

or without the --plot_fit input argument to save a PNG figure (also in nottingham_covid_modelling/cmaes_fits). For convenience, we saved the model outputs in .npy files which are accessed in the plotting script for Figure 3.

Figure 4

In order to generate and save synthetic data for Figure 4-5, type the following:

  • SIR_SINR_AGE_model_default -travel -step --outputnumbers FILENAME_WITH_FULL_PATH

This will simulate the SItD model and some configuration of the simple models. The observation model will be simulated with a negative binomial distribution and saved as part of the SItD simulation. The exact data for Figure 3 can be found in SItRDmodel_ONSparams_noise_NB_NO-R_travel_TRUE_step_TRUE.npy in nottingham_covid_modelling/out_SIRvsAGEfits.

To fit the current synthetic data to the simple models, and the SItD model, we estimate MAPs for all models using CMA-ES optimisation first. We repeat the optimization 100 times. To generate the files run:

  • optimise_likelihood_anymodel --model_name SIR --informative_priors -r 100 for the $SIRD$ model
  • optimise_likelihood_anymodel --model_name SItD --informative_priors for the $SI_tD$ model
  • optimise_likelihood_anymodel --model_name SIRD --informative_priors -r 100 for the $SIRD_{\Delta D}$ model
  • optimise_likelihood_anymodel --model_name SUIR --informative_priors -r 100 for the $SUIRD$ model
  • optimise_likelihood_anymodel --model_name SEIUR --informative_priors -r 100 for the $SE^2I^2U^2RD$ model fiting all parameters
  • optimise_likelihood_anymodel --model_name SEIUR --informative_priors -r 100 --partial -pto rho Iinit1 theta xi --fixed_eta 0.1923 for the $SE^2I^2U^2RD$ model fixing eta as $1/5.2 \approx 0.1923$

Then, we use the MAPS obtained before as initial step for 5 MCMC chains. To obtain the chains, run:

  • mcmc_anymodel --model_name SIR --informative_priors --out_mcmc out_mcmc_table2 -n 1000000 --chains5 for the $SIRD$ model
  • mcmc_anymodel --model_name SItD --informative_priors --out_mcmc out_mcmc_table2 -n 1000000 --chains5 for the $SI_tD$ model
  • mcmc_anymodel --model_name SIRD --informative_priors --out_mcmc out_mcmc_table2 -n 1000000 --chains5 for the $SIRD_{\Delta D}$ model
  • mcmc_anymodel --model_name SUIR --informative_priors --out_mcmc out_mcmc_table2 -n 2000000 --chains5 for the $SUIRD$ model
  • mcmc_anymodel --model_name SEIUR --informative_priors --out_mcmc out_mcmc_table2 -n 8000000 --chains5 for the $SE^2I^2U^2RD$ model fiting all parameters
  • mcmc_anymodel --model_name SEIUR --informative_priors --out_mcmc out_mcmc_table2 -n 3000000 --chains5 --partial -pto rho Iinit1 theta xi --fixed_eta 0.1923 for the $SE^2I^2U^2RD$ model fixing eta as $1/5.2 \approx 0.1923$

Note that we ran some of the models for longer than others. The corresponding chain files were really big for this figure, hence we only saved in the repository the corresponding exploratory plots. Such exploratory plots can be generated running:

  • plot_mcmc_anymodel --model_name "model_str" --informative_priors --out_mcmc out_mcmc_table2 --chains5 for "model_str" each of {SIR, SItD, SIRDeltaD, SIUR, SEIUR}.
  • plot_mcmc_anymodel --model_name SEIUR --informative_priors --out_mcmc out_mcmc_table2 --chains5 --partial -pto rho Iinit1 theta xi --fixed_eta 0.1923 for the $SE^2I^2U^2RD$ model fixing eta as $1/5.2 \approx 0.1923$

We reported the derived parameters obtained from the first chain of the previous MCMC calculation. You can calculate the derived parameters by running:

  • plot_mcmc_anyModel_derivedparams --model_name "model_str" --informative_priors --out_mcmc out_mcmc_table2 --chains5 for "model_str" each of {SIR, SItD, SIRDeltaD}.
  • plot_mcmc_anyModel_derivedparams --model_name SIUR --informative_priors --out_mcmc out_mcmc_table2 --chains5 --burn_in 1000000 for the $SIURD$ model.
  • plot_mcmc_anyModel_derivedparams --model_name SEIUR --informative_priors --out_mcmc out_mcmc_table2 --chains5 --burn_in 3000000 for the $SEIURD$ model fiting all parameters.
  • plot_mcmc_anyModel_derivedparams --model_name SEIUR --informative_priors --out_mcmc out_mcmc_table2 --chains5 --burn_in 2000000 --partial -pto rho Iinit1 theta xi --fixed_eta 0.1923 for the $SE^2I^2U^2RD$ model fixing eta as $1/5.2 \approx 0.1923$

Then, figure 4 can be generated, along with prnting the first part of table 2, by running:

  • python nottingham_covid_modelling/figures/plot_figure4_table2.py

In this last code, the MAPS for each model are saved.

Figure 5

Figure 5 is generated following the same steps as Figure 4, but adding the code for the corresponding fixed parameters. The exact code for the CMA-ES optimisation is:

  • optimise_likelihood_anymodel --model_name SIR --informative_priors -r 100 --partial -pto rho Iinit1 --fixed_theta 0.1923 for the $SIRD$ model with $$\theta = 1/5.2 \approx 0.1923.$$.
  • optimise_likelihood_anymodel --model_name SItD --informative_priors for the $SI_tD$ model.
  • optimise_likelihood_anymodel --model_name SIRD --informative_priors -r 100 --partial -pto rho Iinit1 --fixed_theta 0.1923 for the $SIRD_{\Delta D}$ model with $$\theta = 1/5.2 \approx 0.1923, \quad \Delta D = 18.69 - 1 - 5.2.$$
  • optimise_likelihood_anymodel --model_name SUIR --informative_priors -r 100 --partial -pto rho Iinit1 --fixed_theta 0.1923 for the $SUIRD$ model with $$\theta = 1/5.2 \approx 0.1923 , \quad \xi = 1/ (18.69 - 1 - 5.2.)$$
  • optimise_likelihood_anymodel --model_name SEIUR --informative_priors -r 100 --partial -pto rho Iinit1 --fixed_eta 0.3362 --fixed_theta 0.3362 for the $SE^2I^2U^2RD$ model with $$\theta = \eta = 1/2.974653 \approx 0.3362, \quad \xi = 1 / 11.744308.$$

Then, we use the MAPS obtained before as initial step for 5 MCMC chains. To obtain the chains, run:

  • mcmc_anymodel --model_name SIR --informative_priors --out_mcmc out_mcmc_table2B --chains5 --partial -pto rho Iinit1 --fixed_theta 0.1923 for the $SIRD$ model
  • mcmc_anymodel --model_name SItD --informative_priors --out_mcmc out_mcmc_table2B --chains5 for the $SI_tD$ model
  • mcmc_anymodel --model_name SIRD --informative_priors --out_mcmc out_mcmc_table2B --chains5 --partial -pto rho Iinit1 --fixed_theta 0.1923 for the $SIRD_{\Delta D}$ model
  • mcmc_anymodel --model_name SUIR --informative_priors --out_mcmc out_mcmc_table2B --chains5 --partial -pto rho Iinit1 --fixed_theta 0.1923 for the $SUIRD$ model
  • mcmc_anymodel --model_name SEIUR --informative_priors --out_mcmc out_mcmc_table2B --chains5 --partial -pto rho Iinit1 --fixed_theta 0.1923 --fixed_theta 0.3362 for the $SE^2I^2U^2RD$

In this case, the chains are not as long as they were for figure 4 so the chain files are in the repository in nottingham_covid_modelling/out_mcmc_table2B. To create the eploratory plots run:

  • plot_mcmc_anymodel --model_name "model_str" --informative_priors --out_mcmc out_mcmc_table2B --chains5 --partial -pto rho Iinit1 --fixed_theta 0.1923 for "model_str" each of {SIR, SIRDeltaD, SIUR}.
  • plot_mcmc_anymodel --model_name SItD --informative_priors --out_mcmc out_mcmc_table2B --chains5 for the $SI_tD$ model`.
  • plot_mcmc_anymodel --model_name SEIUR --informative_priors --out_mcmc out_mcmc_table2 --chains5 --partial -pto rho Iinit1 --fixed_theta 0.1923 --fixed_theta 0.3362 for the $SE^2I^2U^2RD$ model.

To calculate the derived parameters, run:

  • plot_mcmc_anyModel_derivedparams --model_name SItD --informative_priors --out_mcmc out_mcmc_table2 --chains5 for $SI_tD$ model
  • plot_mcmc_anyModel_derivedparams --model_name SItD --informative_priors --out_mcmc out_mcmc_table2 --chains5 --partial -pto rho Iinit1 --fixed_theta 0.1923 for "model_str" each of {SIR, SIRDeltaD, SIUR}.
  • plot_mcmc_anyModel_derivedparams --model_name SEIUR --informative_priors --out_mcmc out_mcmc_table2 --chains5 --partial -pto rho Iinit1 --fixed_theta 0.1923 --fixed_theta 0.3362 for the $SEIURD$ model

Then, figure 5 can be generated, along with prnting the second part of table 2, by running:

  • python nottingham_covid_modelling/figures/plot_figure4_table2B.py

In this last code, the MAPS for each model are saved.

Figure 6

As mentioned before, the results for Figure 6 rely on first obtaining an estimate of MAPs using CMA-ES optimisation to be used as a starting point for Bayesian inference using MCMC. To generate the necessary CMA-ES files, type:

  • optimise_model --ons_data --square --params_to_optimise rho Iinit1 lockdown_baseline lockdown_offset IFR

followed by

  • mcmc --ons_data --square --params_to_optimise rho Iinit1 lockdown_baseline lockdown_offset IFR

to run the Markov chain Monte Carlo (MCMC) sampling. Note that each of these commands could take several hours to run, so it is advised to use an HPC resource, from which the code can take advantage of parallel capabilities. Our favoured method is to use a window manager such as screen or tmux to keep a session active even if disconnected, and to run the commands in detached mode, e.g. running

  • mcmc --ons_data --square --params_to_optimise rho Iinit1 lockdown_baseline lockdown_offset IFR &> mcmc_figure5.out &

will print the console output to mcmc_figure5.out rather than to screen. Typing jobs in the terminal will show the status, i.e. Running or Done (or Exited if there is some error).

Figure 7

Similarly to Figure 6, the results for Figure 7 rely on first obtaining the MAP estimate using CMA-ES. To generate the necessary CMA-ES files, type:

  • optimise_model --ons_data --square --params_to_optimise rho Iinit1 lockdown_baseline lockdown_offset beta_mean beta_var death_mean death_dispersion IFR --repeats 10

followed by

  • mcmc --ons_data --square --params_to_optimise rho Iinit1 lockdown_baseline lockdown_offset beta_mean beta_var death_mean death_dispersion IFR --niter 200000

to run the Markov chain Monte Carlo (MCMC) sampling. Note here that we opted to run more repeats of the CMA-ES optimisation and more iterations of the MCMC due to the higher dimensionality of the problem.

Generating figures

Whereas Figure 2 is a schematic drawing, all other figures can be generated by typing python plot_figure[figure number].py inside the nottingham_covid_modelling/figures directory. For example, running

  • python plot_figure6.py

in the aforementioned directory will plot and save separately the constituent panels of Figure 6.

Acknowledging this work

If you publish any work based on the contents of this repository please cite:

Whittaker, D. G., Herrera-Reyes, A. D., Hendrix, M., Owen, M. R., Band, L. R., Mirams, G. R., Bolton, K. J., Preston, S. P. Uncertainty and error in SARS-CoV-2 epidemiological parameters inferred from population-level epidemic models. J Theor Biol, 2023, 111337.

About

A repository to store Python code for Covid-19 modelling work undertaken in the School of Maths at the University of Nottingham, UK.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors 4

  •  
  •  
  •  
  •  

Languages