Use variant_definitions to type an MSA of SARS-CoV-2 sequences, handling insertions.
MSA must include MN908947.3 to correctly label variant positions.
git clone https://github.com/connor-lab/aln2type
cd aln2type
pip install .
git clone https://github.com/phe-genomics/variant_definitions
usage: aln2type [-h] [--csv_N] [--no_gzip_json] [--output_unclassified]
[--no_call_deletion] [--gb GB]
json_outdir sample_csv_outdir summary_csv_outfile ref_name msa
typing_yaml [typing_yaml ...]
positional arguments:
json_outdir Output directory for typing JSON
sample_csv_outdir Output directory for sample variant CSVs
summary_csv_outfile Output summary CSV file
ref_name Name of reference sequence in MSA
msa Path to MSA
typing_yaml Path to Variant definition YAML files
optional arguments:
-h, --help show this help message and exit
--csv_N Include Ns in sample variant CSVs
--no_trim_terminal_N Don't trim Ns from sequence terminus. Default behaviour is to trim Ns and gaps BEFORE analysis
--no_gzip_json Don't gzip typing JSON files
--output_unclassified
Retain unclassified samples in summary CSV
--no_call_deletion Allow deleted positions to be treated as no-calls not
ref
--gb GB Path to annotation GenBank
aln2type sample_json_out sample_csv_out typing_summary.csv MN908947.3 sars_cov_2.aln variant_definitions/variant_yaml/*.yml