Skip to content

Normative Modelling

Andre Marquand edited this page Oct 16, 2020 · 43 revisions

Overview

Normative modelling essentially aims to predict centiles of variance in a response variable (e.g. a region of interest or other neuroimaging-derived measure) on the basis of a set of covariates (e.g. age, clinical scores, diagnosis) A conceptual overview of the approach can be found in this publication. For example, the image below shows an example of a normative model that aims to predict vertex-wise cortical thickness data.

In practice, this is done by regressing the biological response variables against a set of clinical or demographic covariates. In the instructions that follow, it is helpful to think of these as being stored in matrices:

There are many options for this, but techniques that provide a distributional form for the centiles are appealing, since they help to estimate extreme centiles more efficiently. Bayesian methods are also beneficial in this regard because they also allow separation of modelling uncertainty from variation in the data. Many applications of normative modelling use Gaussian Process Regression, which is the default method in this toolkit. Typically (but not always), each response variable is estimated independently.

Data formats

Generally the covariates are specified in text format, roughly following the FSL convention in that the text file should contain one entry (i.e. subject) per line, with columns space or tab separated and no headers. For example:

# head cov.txt
52 55 94 4.6
49 43 59 4.6
56 80 63 5.6
39 48 42 4.3
...

For the response variables, the following data formats are supported:

  • NIfTI (e.g. .nii.gz or .img/.hdr)
  • CIFTI (e.g. .dtseries.nii)
  • Pickle/pandas (e.g. .pkl)
  • ASCII text (e.g. .txt, .csv, .tsv)

For nifti/cifti formats, data should be in timeseries format with subjects along the time dimension and these images will be masked and reshaped into vectors. If no mask is specified, one will be created automatically from the image data.

Basic usage (command line)

The simplest method to estimate a normative model is using the normative.py script which can be run from the command line or imported as a python module. For example, the following command will estimate a normative model on the basis of the matrix of covariates and responses specified in cov.txt and resp.txt respectively. These are simply tab or space separated ASCII text files that contain the variables of interest, with one subject per row.

# python normative.py -c cov.txt -k 5 -a "blr" resp.txt

The argument -a tells the algorithm to use Bayesian Linear regression rather than the default Gaussian process regression model and -k 5 tells the command to run internal 5-fold cross-validation across all subjects in the covariates and responses files. Alternatively, the model can be evaluated on a separate dataset by specifying test covariates (and optionally also test responses).

Table 1: command line arguments

Argument keyword Description
-c covfunc Covariate file
-k cvfolds Number of cross-validation folds
-t testcov Test covariates
-r testresp Test responses
-m maskfile mask to apply to the response variables (nifti/cifti only)
-a alg Estimation algorithm: 'gpr' (default), 'blr', 'np', 'hbr' or 'rfa'. See below

Note that keyword arguments can also be specified from the command line to offer additional flexibility. For example, the following command will fit a normative model to the same data, but without standardizing the data first and additionally writing out model coefficients (this is not done by default because they can use a lot of disk space).

# python normative.py -c cov.txt -k 5 -a "blr" resp.txt standardize=False savemodel=True

Basic usage (scripted)

The same can be done by importing the estimate function from normative.py

from pcntoolkit.normative import estimate

# estimate a normative model
estimate("cov_train.txt", "resp_train.nii.gz", maskfile="mask.nii.gz", testresp="resp_test.nii.gz", testcov="cov_test.txt", alg="blr")


resp_file_txt = os.path.join(data_dir, 'resp.txt')
cov_file_txt = os.path.join(data_dir, 'cov.txt')

estimate(cov_file_txt, resp_file_txt, testresp = resp_file_txt, testcov=cov_file_txt ,alg=alt_alg, configparam=2)

print(test_num, "Testing again using the same data under cross-validation")
estimate(cov_file_txt, resp_file_txt, cvfolds=2 ,alg=alt_alg, configparam=1)

save_output(os.getcwd(), tdir)
test_num, tdir = update_test_counter(test_num, test_dir)

print(test_num, "Testing larger dataset (blr with pkl data)...")
print("----------------------------------------------------------------------")

cov_file_tr = os.path.join(data_dir,'cov_big_tr.txt')
cov_file_te = os.path.join(data_dir,'cov_big_te.txt')
resp_file_tr = os.path.join(data_dir,'resp_big_tr.txt')
resp_file_te = os.path.join(data_dir,'resp_big_te.txt')

Clone this wiki locally