-
Notifications
You must be signed in to change notification settings - Fork 4
Meta analysis pipeline using SMMAT
The steps for meta-analysis are very similar to the single-cohort workflow. The main additional step is that the variant files have to be merged across all cohorts to incorporate variants present in some cohorts but not others.
Please follow the steps outlined in (the single-cohort workflow)[https://github.com/hmgu-itg/burden_testing/wiki/Single-cohort-analysis-using-SMMAT], until you reach "Turning the variant info file into a group file" (do not execute that step).
This step can only be run with access to all variant files above from all cohorts. This is typically performed centrally after every cohort has sent their per-cohort variant files. It merges all observed variation across the populations of study and applies filters.
collapse.varfiles.R [output_filename] [[file_1] ... ]
[output_filename] : self-explanatory
[[file_1] ... ] : variant files from Step 1, separated by spaces
This script will discard positions with AN==0
and tell you about it. It will also silently discard positions whose average minor allele frequency is greater than 0.05 and missingness greater than 0.01. These options cannot be parameterised, unfortunately, and must be changed in the script.
Please follow the paragraphs "Turning the variant info file into a group file" and "Building sets using the variant selector" in (the single-cohort workflow)[https://github.com/hmgu-itg/burden_testing/wiki/Single-cohort-analysis-using-SMMAT].
You will have received a sets file from the central analysis team, which will have processed the variant file as above. You will also require a relatedness matrix in GCTA or GEMMA format for all analysed individuals. Please follow the pipeline outlined in "Step 2" in (the single-cohort workflow)[https://github.com/hmgu-itg/burden_testing/wiki/Single-cohort-analysis-using-SMMAT].
This is also performed centrally by a team with access to all the output files from Step 2 above.
The script expects a common variant set file and a set of directories containing single-cohort outputs.
run.meta [group file] [[cohort1_dir] [cohort2_dir] ...]
The directories should contain .score.*
and var.*
files and should follow the naming convention described above. Chunking does not need to be homogeneous across cohorts. The output is standardised and filenames will contain the names of all cohorts.
It is also possible to run a single chunk instead of entire directories and to specify custom output files (useful for debugging) using run.meta.unit
.