Based on the Random Forest model, our pipeline utilizes Whole Genome Sequencing (WGS) data from genetic suppressor screenings to identify causative mutations without the need to generate recombinant inbred lines.
The pipline requires the following packages:
You can install these packages from PyPI
using pip
:
pip install scipy pandas tqdm matplotlib
The model
and the gene range file
can be downloaded from Zenodo.
The python script
can be downloaded from this repository.
gene range file
can be customized according to the newest version of genome annotation.
Option | Description |
---|---|
--model | The path to the model file |
--data | The directory where the VCF file is located |
--ref | The path to the gene range file |
--threshold | The threshold for excluding background mutations. When the same variation is observed n times in your data, it will be considered a background mutation and discarded. |
--out | The path where the output file will be saved |
--background | (Optional) The path to the background file which contains the WGS data of a pre-mutated worm |
The gene range file should be a CSV
file with four columns. The first column represents the gene name
, the second column denotes the chromosome
( 'I', 'II', ..., 'X') where the gene is located, the third column indicates the starting position
, and the fourth column signifies the end position
.
A volcano_table.csv
file and a volcano.jpg
file should have been generated in the output folder.
The CSV
file contains the fold change and p-value for each gene showing mutations in the background-removed mutation pool. You can find the candidate gene by sorting by p-value.
The JPG
file is a volcano plot drawn based on the CSV file.