Skip to content
Angela Tam edited this page May 20, 2016 · 11 revisions

Overview of the subtyping pipeline

The goal of this pipeline is to cluster subjects together based on similarity on a given measure (e.g. functional connectivity, cortical thickness, etc) and then perform subsequent statistical analyses.

The steps of the pipeline are as follows:

  • Preprocessing of the data to create a "stack" map per network as input for the rest of the pipeline, with the option of regressing confounds prior to subtyping. The stack map contains a Subjects x Voxels array. See niak_brick_network_stack.m for more info.
  • Generating the similarity matrix, a Subjects x Subjects matrix to illustrate how similar subjects are to one another and how subjects can be clustered together. See niak_brick_similarity_matrix.m for more info.
  • Clustering the subjects to form subtypes (or subgroups) within the dataset See niak_brick_subtyping.m for more info.
  • Calculating the subtype "weights" for each subject, a measure of the strength of the association between each subject to a given subtype. See niak_brick_subtype_weight.m for more info.
  • Statistical tests of association, to test how subtypes may be related to variables of interest. See niak_brick_association_test.m for more info.

The command to run the pipeline in a Matlab/Octave session is: niak_pipeline_subtype(files_in,opt) where "files_in" is a structure describing how the dataset is organized, and "opt" is a structure describing the options of the pipeline. See this test script for an example of how to write your own script to call the pipeline.

Input files

  • Individual maps (e.g. rmap_part, stability_maps, etc).
  • A 3D binary mask
  • A model file (optional)

These inputs must be specified in a structure, with required subfields for data and mask, and optionally, model.

Individual maps

These maps can be any type of preprocessed map (for example, rmap_part or stability_maps generated from niak_brick_scores_fmri). N.B. The pipeline assumes there is only one (1) mnc.gz or nii.gz per subject.

To grab the individual maps, we will have to build a structure. For example:
files_in.data.subject1 = 'data/subject1_session1_stability_maps.mnc.gz';
files_in.data.subject2 = 'data/subject2_session1_stability_maps.mnc.gz';

3D binary mask

The "mask" field is the name of a 3D binary volume serving as a mask for the analysis. It can be a mask of the brain common to all subjects, or a mask of a specific brain area, e.g. the thalami. It is important to make sure that this segmentation is in the same space and resolution as the fMRI datasets. If not, use SPM/SPM or MINCRESAMPLE to resample the mask at the correct resolution.

To specify the mask, add a subfield for the mask to the files_in structure. For example:
files_in.mask = '/home/pbellec/demo_niak_preproc/quality_control/group_coregistration/func_mask_group_stereonl.mnc.gz';

Model file

The model file is a .csv file containing demographic information, including variables of interest and confound variables, for each subject specified in files_in.data. This input is optional.

To specify the model files, add a subfield to the files_in structure. For example:
files_in.model = 'data/model.csv';

Options

The different options are passed through fields in the structure "opt".

The first option is the name of the folder where the results will be stored. Note that this folder does not need to be created before hand. Example:
opt.folder_out = 'data/subtype_results/'; % Where to store results

The second option is the scale of the networks specified in files_in.data (e.g. a brain partition of 5 networks is considered to be at scale 5). Example:
opt.scale = 5;

There is the option to regress out confounding variables during the generation of the stack maps, prior to subtyping. N.B. The confounding variables that are specified in the option must correspond to variables within the model file. For example:
opt.stack.regress_conf = {'Gender', 'Age'}; % Regress out variables gender and age from stack maps

Subtyping options There are several options that may be specified for the subtyping part of the pipeline. These options must be specified in the structure "opt.subtype".

  • Number of subtypes to extract. For example:
    opt.subtype.nb_subtype = 5; % We will extract 5 subtypes
  • The model for the subtype map. For example:
    `opt.subtype.sbt_map_type = 'median'; % We will ask the subtype volumes to be created based on the median of the data.
  • The following flag turns on/off the generation of a contingency table and subsequent calculation of Cramer's V and Chi-2 statistics:
    opt.subtype.flag_stats = true;
  • The index of the group column in files_in.model on which the contingency table is built. opt.subtype.group_col = 3; % the variable of interest in column 3 of the csv

Association options There are also several options that may be specified for the association testing part of the pipeline. These options must be specified in the structure "opt.association".

  • Scale
  • qFDR
  • Type of FDR correction
  • Contrast
  • Interaction
  • Normalization

Outputs

A number a subfolders and files are created in the "opt.folder_out" directory. In the following, EXT will denote the extension associated with the file type of the functional images, e.g. ".nii" or ".nii.gz" for nifti. An exhaustive description of the outputs follows. Most of them may not be of interest.

A subfolder will be generated for each network (e.g. If 7 networks were tested, there will be 7 subfolders labeled "network_1", "network_2" ... "network_7"). Each subfolder will contain:

By default

  • network_<number>_stack.mat : a .mat file that contains two variables: (1) provenance, a structure that contains information about the subjects, model, and volume; (2) stack, a Subjects x Voxels array.
  • network_<number>_similarity_matrix.mat : a .mat file that contains four variables: (1) provenance; (2) hier, a 2D array defining a hierarchy; (3) sim_matrix, a Subjects x Subjects array; (4) subj_order, a vector containing the order defines a permutation on the subjects as defined by "hier" when splitting the subjects backward.
  • network_<number>_subtype.mat : a .mat file that contains five variables: (1) provenance; (2) hier; (3) opt, a structure describing options that the user specified; (4) part, a vector where PART(I) = J if the object I is in the class J; (5) sub, a structure containing arrays for different maps.
  • similarity_matrix.pdf : a .pdf illustrating a Subjects x Subjects correlation matrix
  • dendrogram.pdf : a .pdf illustrating the clustering of the subjects
  • grand_mean.nii.gz : a 3D map illustrating the mean connectivity within the network across all subjects
  • grand_std.nii.gz : a 3D map illustrating the standard deviation of the connectivity within the network across all subjects
  • mean_subtype.nii.gz : a 4D map illustrating the mean connectivity within the network for each subtype
  • ttest_subtype.nii.gz : a 4D map illustrating the statistical difference in a t-test between each subtype and all other subtypes
  • eff_subtype.nii.gz : a 4D map illustrating the effect size of the difference between each subtype and all other subtypes

Optional : The following will only be generated if opt.subtype.flag_stats = true

  • group_stats.mat : a .mat file that contains two variables: (1) model, a structure containing information about the subjects; (2) stats, a structure with results from Chi-2 and Cramer's V tests.
  • chi2_contingency_table.csv : a .csv file that contains a contingency table based on user-specified options
  • pie_chart_<number>.pdf : a .pdf illustrating the proportions of subjects within subtypes.
Clone this wiki locally