Delineate bacterial genera quickly and transparently using the Percentage Of Conserved Proteins (POCPu) using a validated nextflow workflow
consprot run --input <directory of genomes> --output <name of output directory>
consprot
will run a nextflow workflow locally and compute POCPu values for all pairwise comparisons (except self-comparisons). This means that adding -resume
will reuse cached results and only rerun what changed (e.g., additional genomes):
consprot run --input <directory of genomes> --output <name of output directory> -resume
A directory with genome files (.fa
, .fna
or .fasta
).
Upcoming:
- Protein files if already available and/or to bypass pyrodigal
- List of files to bypass nextflow file search
The specified output directory will contain a pocpu.csv
with the following columns:
query
: basename of the genome used in this comparisonsubject
: basename of the genome used in this comparisonpocp
: value of legacy POCP [0-100]. Could exceed 100 in case of duplicated genes.pocpu
: value of POCPu [0-100] using only unique matches.
Note: nextflow workflow produce additional files and directory where it is ran. For instance, the log file of the latest run is .nextflow.log
, or the working directory with temporary files is work
.
At the moment, install consprot
by cloning the repository and follow the development section.
Upcoming: bioconda
To build consprot
safely, work in a separate conda environment and install using the dedicated pip:
mamba create -n consprot-dev python pip
conda activate consprot-dev
python -m pip install -e .
consprot -v