Skip to content

Meta analysis pipeline using SMMAT

andrei-barysenka edited this page Oct 12, 2020 · 4 revisions

The steps for meta-analysis are very similar to the single-cohort workflow. The main additional step is that the variant files have to be merged across all cohorts to incorporate variants present in some cohorts but not others.

Step 1: Creating per-cohort group files

Please follow the steps outlined in (the single-cohort workflow)[https://github.com/hmgu-itg/burden_testing/wiki/Single-cohort-analysis-using-SMMAT], until you reach "Turning the variant info file into a group file" (do not execute that step).

Intermediate Step: Merging

This step can only be run with access to all variant files above from all cohorts. This is typically performed centrally after every cohort has sent their per-cohort variant files. It merges all observed variation across the populations of study and applies filters.

Syntax

collapse.varfiles.R [output_filename] [[file_1] ... ]

   [output_filename] : self-explanatory
   [[file_1] ... ]   : variant files from Step 1, separated by spaces

This script will discard positions with AN==0 and tell you about it. It will also silently discard positions whose average minor allele frequency is greater than 0.05 and missingness greater than 0.01. These options cannot be parameterised, unfortunately, and must be changed in the script.

Producing group files

Please follow the paragraphs "Turning the variant info file into a group file" and "Building sets using the variant selector" in (the single-cohort workflow)[https://github.com/hmgu-itg/burden_testing/wiki/Single-cohort-analysis-using-SMMAT].

Step 2

You will have received a sets file from the central analysis team, which will have processed the variant file as above. You will also require a relatedness matrix in GCTA or GEMMA format for all analysed individuals. Please follow the pipeline outlined in "Step 2" in (the single-cohort workflow)[https://github.com/hmgu-itg/burden_testing/wiki/Single-cohort-analysis-using-SMMAT].

Step 3 : Meta-analysis

This is also performed centrally by a team with access to all the output files from Step 2 above.

Syntax for genome-wide meta-analysis

The script expects a common variant set file and a set of directories containing single-cohort outputs.

run.meta [group file] [[cohort1_dir] [cohort2_dir] ...]

The directories should contain .score.* and var.* files and should follow the naming convention described above. Chunking does not need to be homogeneous across cohorts. The output is standardised and filenames will contain the names of all cohorts.

It is also possible to run a single chunk instead of entire directories and to specify custom output files (useful for debugging) using run.meta.unit.