This repository comprises the script(s) developed during Monkeypox 2022 outbreak to explore the mutational profiles/signatures of this virus, but that can be of broad application to other species. Currently, it comprises the script(s):
- get_mutation_profile.py that can be used to rapidly obtain the sequence context (size defined by the user) flanking SNPs of interest and determine their mutational profile according to the user's specifications (e.g. APOBEC3-mediated viral genome editing GA>AA and TC>TT replacements)
OPTION1
Inputs:
- TSV file with the columns POS REF ALT (i.e. 1-indexed reference position, reference allele and alternative allele)
- Fasta file including the reference genome
Output:
- TSV file with the mutation context and profile
OPTION 2
Inputs:
- TSV file with the columns ID POS REF ALT (i.e. sample ID, 1-indexed reference position, reference allele and alternative allele)
- Fasta file including the reference genome
Outputs:
- TSV file with the mutation context and profile for each sample present in the TSV input
- TSV file with a summary report for each position of interest including the different patterns observed and their respective frequency
NOTE: For options 1 and 2 the order of the columns in the input 1 is not important but their name is (ID, POS, REF, ALT)!!
OPTION 3
Inputs:
- Single-column file with a list of 1-indexed reference positions of interest
- Multiple Sequence Alignment (fasta) including the reference genome
Outputs:
- TSV file with the mutation context and profile for each sample present in the alignment
- TSV file with a summary report for each position of interest including the different patterns observed and their respective frequency
TIP: If you do not know your positions of interest, you can run the script alignment_processing.py of ReporTree and it will provide a list of positions of interest according to your specifications.
To run the get_mutation_profile.py you will need:
- biopython
- pandas
pip install mutation-profile
mutation-profile -h
conda create -n mutation-profile vmixao::mutation-profile
conda activate mutation-profile # if you created the conda environment
mutation-profile -h
-h, --help show this help message and exit
Mutation profile:
Provide input/output specifications
-f FASTA, --fasta FASTA
[MANDATORY] Input sequence file (fasta)
-m MUTATION, --mutation_list MUTATION
[MANDATORY] Input mutation list that can be: 1)
single-column file with 1-based reference position
information (in this case the fasta file must be a
multiple sequence alignment of all the sequences of
interest); OR 2) tsv file with the columns POS, REF,
and ALT where POS = 1-based reference position. If you
want to include information for more than one sample
per position, add also the column 'ID' (note that the
order of the columns is not important but their name
is!)
-r REF, --reference REF
[MANDATORY] Reference sequence name
-b BEFORE, --before BEFORE
[OPTIONAL] Number of nucleotides to report BEFORE the
mutation (default = 5)
-a AFTER, --after AFTER
[OPTIONAL] Number of nucleotides to report AFTER the
mutation (default = 5)
-p PROFILES, --profiles PROFILES
[OPTIONAL] Comma-separated list of mutational profiles
of interest (upper-case!). Default = 'GA>AA,TC>TT'
-o OUTPUT, --output OUTPUT
[OPTIONAL] Tag for output file name. Default =
Mutation_profile
-v, --version Print version and exit
Examples using Monkeypox 2022 outbreak data available at examples/
Providing a TSV file with the columns POS REF ALT (i.e. 1-indexed reference position, reference allele and alternative allele) and a fasta file including the reference genome (can be the same alignment or a normal fasta sequence).
mutation-profile -f alignment_Figure1B.fasta -m positions_of_interest_POS_REF_ALT.tsv -r 'MT903344.1_Monkeypox_virus_isolate_MPXVUK_P2_complete_genome' -b 10 -a 10 -o OPTION1
Output:
- TSV file with the mutation context and profile
Providing a TSV file with the columns ID POS REF ALT (i.e. samples id, 1-indexed reference position, reference allele and alternative allele) and a fasta file including the reference genome (can be the same alignment or a normal fasta sequence).
mutation-profile -f alignment_Figure1B.fasta -m positions_of_interest_ID_POS_REF_ALT.tsv -r 'MT903344.1_Monkeypox_virus_isolate_MPXVUK_P2_complete_genome' -b 10 -a 10 -o OPTION2
Outputs:
- TSV file with the mutation context and profile for each sample present in the TSV input
- TSV file with a summary report for each position of interest including the different patterns observed and their respective frequency
Providing a single-column file with a list of 1-indexed reference positions of interest and a fasta Multiple Sequence Alignment including the reference genome.
mutation-profile -f alignment_Figure1B.fasta -m Monkeypox_positions_of_interest.tsv -r 'MT903344.1_Monkeypox_virus_isolate_MPXVUK_P2_complete_genome' -b 10 -a 10 -o OPTION3
Outputs:
- TSV file with the mutation context and profile for each sample present in the alignment
- TSV file with a summary report for each position of interest including the different patterns observed and their respective frequency
TIP: If you do not know your positions of interest, you can run the script alignment_processing.py of ReporTree and it will provide a list of positions of interest according to your specifications. Example:
python ReporTree/scripts/alignment_processing.py -align alignment_Figure1B.fasta -o Monkeypox --use-reference-coords -r 'MT903344.1_Monkeypox_virus_isolate_MPXVUK_P2_complete_genome' --keep-gaps --get-positions-interest
If you use this script please cite the article where it was first described:
Isidro, J., Borges, V., Pinto, M. et al. Phylogenomic characterization and signs of microevolution in the 2022 multi-country outbreak of monkeypox virus.
Nature Medicine (2022). https://doi.org/10.1038/s41591-022-01907-y