Skip to content

Commit

Permalink
Feature/proteome annotation 20240921 (#10)
Browse files Browse the repository at this point in the history
* version -> 0.10.0
* added `cayman annotate_proteome` and moved standard profiling to `cayman profile`
* updated readme
---------

Co-authored-by: karchern <[email protected]>
  • Loading branch information
cschu and karchern authored Oct 27, 2024
1 parent f0089d3 commit 54f66d3
Show file tree
Hide file tree
Showing 14 changed files with 577 additions and 244 deletions.
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,5 @@
*.egg-info/
*__pycache__*
dist/
gqlib/
gqlib/
build/
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
FROM ubuntu:22.04

LABEL maintainer="[email protected]"
LABEL version="0.8.4"
LABEL version="0.10.0"
LABEL description="cayman - profiling carbohydrate active enzymes in metagenomic/transcriptomic wgs samples"


Expand Down
20 changes: 19 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ Reads-Per-Kilobase-Million (RPKM) abundances for your sample. Cayman makes heavy
- pysam
- intervaltree
- gqlib>=2.14.3 (which should take care of all python library requirements)
- pyhmmer (for protein set annotation)

You will need a `bwa` installation. One way -- if you didn't install `cayman` via bioconda or if you're not using a container -- would be to use `conda env create -f environment.yml` using the provided [environment.yml](environment.yml).

Expand Down Expand Up @@ -54,8 +55,10 @@ Cayman can most easily be installed via

Cayman can be run from the command line as follows:

<font color="#ff0000"><b>Attention: As of version 0.10.0, cayman profiling is invoked with `cayman profile` instead of `cayman`.</b></font>

```
cayman \
cayman profile \
<input_options> \
</path/to/db> \
</path/to/bwa_index> \
Expand Down Expand Up @@ -122,3 +125,18 @@ The following lines contain the counts for each CAZy family present in the sampl
- `<out_prefix>.gene_counts.txt` contains the gene profiles of the sample. The format is identical to the CAZy profiles, featuring are the detected genes from the respective gene catalogue.

- `<out_prefix>.aln_stats.txt` contains statistics on the alignments in the sample.


## Annotating protein sets with Cayman hmms

The default `hmm_database` can be obtained from [Zenodo](https://zenodo.org/records/13998227)

```
cayman annotate_proteome \
</path/to/cayman/hmm_database> \
</path/to/input/proteins> \
[ -o/--output_file </path/to/output_file>, default: cayman_annotation.csv ] \
[ -t/--threads <int> ] \
[ --cutoffs <path/to/cutoff_values>, default: </path/to/cayman/hmm_database/cutoffs.csv>]
```

2 changes: 1 addition & 1 deletion Singularity.latest
29 changes: 29 additions & 0 deletions Singularity.v0.10.0
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
Bootstrap: docker
From: ubuntu:20.04
IncludeCmd: yes

%labels
MAINTAINER cschu ([email protected])
VERSION v0.10.0

%environment
export LC_ALL=C

%post
apt-get update
apt-get install -y apt-transport-https apt-utils software-properties-common
apt-get install -y add-apt-key
export DEBIAN_FRONTEND=noninteractive
ln -fs /usr/share/zoneinfo/America/New_York /etc/localtime
apt-get install -y tzdata
dpkg-reconfigure --frontend noninteractive tzdata

apt-get install -y wget python3-pip git gawk bwa


mkdir -p /opt/software && \
cd /opt/software && \
git clone https://github.com/zellerlab/cayman.git && \
cd cayman && \
pip install .

39 changes: 0 additions & 39 deletions Singularity.v0.8.4

This file was deleted.

2 changes: 1 addition & 1 deletion cayman/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
""" module docstring """
__version__ = "0.9.7"
__version__ = "0.10.0"
__toolname__ = "cayman"
76 changes: 5 additions & 71 deletions cayman/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@
def main():

args = handle_args(sys.argv[1:])
if args.cutoffs is None:
args.cutoffs = os.path.join(args.hmmdb, "cutoffs.csv")
args.aligner = "bwa"

logger.info("Version: %s gqlib: %s", __version__, gqlib_version)
Expand All @@ -33,79 +35,11 @@ def main():
)

print(args)
args.func(args)

input_data = check_input_reads(
args.reads1, args.reads2,
args.singles, args.orphans,
)

if not os.path.exists(args.annotation_db):
raise ValueError(
f"{args.annotation_db} is not a valid annotation database"
)

if not check_bwa_index(args.bwa_index):
raise ValueError(f"{args.bwa_index} is not a valid bwa index.")

if os.path.dirname(args.out_prefix):
pathlib.Path(os.path.dirname(args.out_prefix)).mkdir(
exist_ok=True, parents=True
)

db_importer = SmallDatabaseImporter(
logger, args.annotation_db, single_category="cazy", db_format=args.db_format,
)
logger.info("Finished loading database.")

profiler = RegionQuantifier(
db=db_importer,
out_prefix=args.out_prefix,
ambig_mode="1overN",
reference_type="domain",
)

aln_runner = BwaMemRunner(
args.cpus_for_alignment,
args.bwa_index,
sample_id=os.path.basename(args.out_prefix),
)

for input_type, *reads in input_data:

logger.info("Running %s alignment: %s", input_type, ",".join(reads))
proc, call = aln_runner.run(
reads,
single_end_reads=input_type == "single",
)

try:
profiler.count_alignments(
proc.stdout,
aln_format="sam",
min_identity=args.min_identity,
min_seqlen=args.min_seqlen,
)

except Exception as err:
if isinstance(err, ValueError) and str(err).strip() == "file does not contain alignment data":
# pylint: disable=W1203
logger.error("Failed to align. This could have different reasons:")
logger.error(f"* Is `{args.aligner}` installed and on the path? Type `bwa mem` and see what happens.")
logger.error("* Syntax errors or missing files. Please try running the aligner call below manually to troubleshoot the problem.")
logger.error("* Alignment stream was interrupted, perhaps due to a memory issue.")

logger.error("Aligner call was:")
logger.error("%s", call)
sys.exit(1)

logger.error("Encountered problems digesting the alignment stream:")
logger.error("%s", err)
logger.error("Aligner call was:")
logger.error("%s", call)
logger.error("Shutting down.")
sys.exit(1)
return None

profiler.finalise(restrict_reports=("raw", "rpkm",))



if __name__ == "__main__":
Expand Down
Empty file added cayman/annotate/__init__.py
Empty file.
Loading

0 comments on commit 54f66d3

Please sign in to comment.