From ec86fa834199da7b12c9e0c473fb188faf80caae Mon Sep 17 00:00:00 2001 From: Chaim Schramm Date: Wed, 29 May 2024 00:12:29 -0400 Subject: [PATCH] Update README.md --- README.md | 57 +++++++++++++++++++++++++++++++++++++------------------ 1 file changed, 39 insertions(+), 18 deletions(-) diff --git a/README.md b/README.md index 99b7e43..6bc0bed 100644 --- a/README.md +++ b/README.md @@ -1,30 +1,51 @@ +

+ +

+ # ALIGaToR - Annotator of Loci for IG and T-cell Receptors A pipeline for annotating genomic contigs from the IG and TR loci. The pipeline includes: - Extract: A parsing script that extracts gene, exon, and RSS name and corrdinates from reference annotations of choice of closely related species. - Predict: A prediction script calls submodule DnaGrep, that predicts RSS sequences based on genomic contigs. - Annotate: Annotator script that uses the extracted reference genome and genomic information to generate a search databse for blast. Blast hits are matched with predicted RSSs. Other scripts are called to check for start and stop codons, and splice sites. -## Getting Started -Clone the aligator repository -git clone https://github.com/scharch/aligator.git - ## Dependencies/Prerequisites -- Python -- Beautifulsoup 4.12.3 +- Python 3.6 or greater - Muscle - Blast+ -- pyBedTools - -## Usage -aligator --help -### Example - #Download BK063715 fasta file from IMGT.org - #extract IGH annotations from IMGT's rheMac10 - aligator extract https://imgt.org/ligmdb/view.action?id=BK063715 BK063715 +- BedTools + +## Getting Started +Clone the aligator repository: + + git clone https://github.com/scharch/aligator.git + +Install required python packages: + + pip install -r aligator/requirements.txt + +Set enviromental variable: + + export ALIGATOR_PATH=$(pwd)/aligator + +Quick help: + + `aligator help` + + +## Vignette annotating MF989451 from Ramesh et al Frontiers Immunology 2017: +Data is in `aligator/sample_data`. + +First, get reference genome from IMGT: + + #Download BK063715 fasta file from https://imgt.org/ligmdb/view.action?format=FASTA&id=BK063715 + #Then create bedfile with reference annotations + aligator extract https://imgt.org/ligmdb/view.action?id=BK063715 BK063715 + +Find possible RSS motifs in the target contig. For MF989451, the output should look the same as `sample_data/MF989451.rss12_pred.bed` and `sample_data/MF989451.rss23_pred.bed`: - #predict RSS for MF989451 and compare to sample data - aligator predict /sample_data /sample_data/MF989451.fa MF989451 + aligator predict $ALIGATOR_PATH/sample_data/MF989451.fa MF989451 + +Finally, annotate the target contig. For MF989451, the actual annotations provided by Ramesh et al are included as `sample_data/MF989451.ground_truth.bed`: - #annotate MF989451 and compare to sample data - aligator annotate /sample_data/MF989451.fa /sample_data/MF989451.rss12_pred.bed MF989451.rss23_pred.bed IGH BK063715.fasta BK063715.bed --alleledb coding.fa --outgff annotations.gff --outfasta IgGenes.fa --blast blastn + aligator annotate $ALIGATOR_PATH/sample_data/MF989451.fa MF989451.RSS12.bed MF989451.RSS23.bed IGH BK063715.fasta BK063715.bed