Clone the github repository to your machine to use the EvoCov package. Before using, you should check that any python dependencies are installed.
git clone https://github.com/ciarajudge/EvoCov.git
pip install -r requirements.txt
The prediction aspect of the pipeline makes use of estimated site-wise mutation rates from analysis of a SARS-CoV-2 phylogenetic tree with baseml, a Phylogenetic Analysis by Maximum Likelihood program. To generate these rates, you must download and compile paml and place it in the ./treecov/ directory, where the path to the baseml executable is ./treecov/paml/bin/baseml. It is important that the folders are named correctly. You must also download a phylogenetic tree on GISAID by clicking Audacity on the platform, and place the file global.tree in the ./treecov/ directory. To run the treecov pipeline to generate the rates, navigate to the treecov directory and use the command:
python treepipe.py /absolute/path/to/GISAID/fasta/file
This initiates the process of iterative sampling and analysis of the phylogenetic tree 100 times, in 10 batches of 10. These batch sizes, or the number of batches, can be adjusted by changing the number of loops in the code in subsampletree.R (for batch size) and treepipe.py (for no. of batches).
Navigate to the cloned repository and call the package along with the file paths of your latest GISAID unmasked sequence file and metadata file. This will initiate a default run of the pipeline, including handling of any exceptions or options. This includes the final step of the pipeline where the results are piped to a PDF using R.
python -m evocov /path/to/sequencefile_masked.fa /path/to/metadata.tsv
If you'd like to be notified by text when the pipeline is complete, pass a third argument with a valid mobile number (no plus signs or brackets) for example: 353877910680 where the country code is +353 and the phone number is 0877910680.
python -m evocov /path/to/sequencefile_masked.fa /path/to/metadata.tsv 353877910680
Navigate to the cloned repository and call the package using the below command.
python -m evocov
Running the pipeline in this manner will create an interactive session where you will be able to select file names for the output, and give the names of the variants you want included in the analysis. Following epitope scoring you will also be given the option to use R to generate an output PDF with the key findings of the pipeline.
Below is a flowchart outlining the rough pipeline structure.
- If the text function isn't working anymore, contact me at [email protected] or here on GitHub. The text message is sent using a subscription type service and I would just need to buy a bit more of an allowance.
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.