Python package for ANOSPP data analysis
ANOSPP is the multiplexed amplicon sequencing assay for Anopheles mosquito species identification and Plasmodium detection. This repository contains the code for analysis of the sequencing results pre-processed with nf-core ampliseq pipeline.
For released version
conda install -c bioconda anospp-analysis
For development setup, see instructions below
Key analysis steps are implemented as standalone scripts:
anospp-prep
takes DADA2 output files and targets primer sequences, demultiplexes the amplicons and yields haplotypes tableanospp-qc
takes haplotypes table, DADA2 stats table and samples manifest and produces QC plotsanospp-plasm
blasts Plasmodium sequences against reference dataset to determine species and infer sample infection statusanospp-nn
compares k-mer profiles of mosquito targets against a reference dataset and provides probabilistic species callsanospp-vae
provides finer scale species prediction for An. gambiae complex with VAE projection
Installation is hybrid with conda + poetry:
git clone [email protected]:malariagen/anospp-analysis.git
cd anospp-analysis
git checkout dev
conda env create -f environment.yml
conda activate anospp_analysis_dev
poetry install
The code in this repository can be accessed via wrapper scripts:
anospp-qc \
--haplotypes test_data/haplotypes.tsv \
--samples test_data/samples.csv \
--stats test_data/stats.tsv \
--outdir test_data/qc
Besides, individual components are available as a python API:
$ python
>>> from anospp_analysis.util import *
>>> PLASM_TARGETS
['P1', 'P2']
TODO Automated testing & CI
Introducing python dependencies should be done via poetry:
poetry add package_name
This should update both pyproject.toml
and poetry.lock
files
If the package should be used in development environment only, use
poetry add package_name --dev
To update environment after changes made to pyproject.toml
and/or poetry.lock
poetry install
Introducing non-python dependencies should be done via conda: edit environment.yml
,
then re-create the conda environment and poetry deps:
mamba env create -f environment.yml
conda activate anospp_analysis
poetry install
Changes in conda environment might also introduce changes to the python installation, in which case one should update poetry lock file
poetry lock