GAWN v0.3.2

WARNING!

GAWN now requires blastplus 2.7.1+
UTR region annotation has been turned off since v0.3
The dependencies (mostly versions) have been updated

Genome Annotation Without Nightmares

Developed by Eric Normandeau in Louis Bernatchez's laboratory with suggestions and important code contributions from Jérémy Le Luyer.

Description

GAWN is a genome annotation pipeline that uses assembled transcriptome, either from the same species or from a related species, to create an evidence-based genome annotation. Its primary goal is to provide good enough genome annotation at a fraction of the time and effort required to run traditional genome annotation pipelines. It uses existing tools, such as GMAP, TransDecoder, blastx, the Swissprot database, etc. to produce the annotation. The result files are:

A GFF3 annotation file
A transcript annotation .tsv table
A genome annotation .tsv table

The .tsv tables are formatted to maximize usability by non-specialized users.

Use cases

This approach is especially useful to annotate genomes of species for which there is a good assembled transcriptome. It will also work when a good transcriptome is available for a related species. It provides only gene annotations for available transcripts. As such, it does not depend on ab initio gene prediction models.

Overview of the analyses

During the analyses, the following steps are performed:

Index the genome (GMAP)
Annotate genes using available transcripts (GMAP)
Annotate the transcripts (blastx and the Swissprot database)
Produce a transcriptome annotation table (Python script)
Produce a genome annotation table (Python script)
TODO: add CpG island annotations

Resources needed

GAWN depends on different tools to annotate genomes. The requirements in terms of RAM, disk space, and time, is dependent on these tools. Here are example requirements for three different eukaryote genomes. The annotation was run on a Lenovo ThinkStation D20 with 8 Xeon CPUs (16 threads, 2.40GHz) on Linux Mint 17 (Ubuntu 16.04). All of these datasets, except Salvelinus fontinalis were run using the most recent genomes and transcriptomes available from Genbank.

Genome	Size (Gbp)	RAM (GB)	Final disc space (GB)	Time (h)
Human genome	3.29	16	37	XX
Salvelinus fontinalis	2.67	14.3	31.2	XX
Danio rerio	1.70	XX	XX	XX
Drosophila melanogaster	1.45	10.2	3.1	28

Installation

To use GAWN, you will need a local copy of its repository, which can be found here.

Different releases can be accessed here. We suggest using version 0.3.1 or a more recent version.

Dependencies

You will also need to have the following programs installed on your computer. The version numbers are the ones that have been tested. It is suggested that you use these or more recent versions.

GNU Linux or OSX
bash 4+
python 2.7+ or 3.6+
cufflinks v2.2.1+
gmap (2017-10-12)
wget 1.17.1
gnu parallel 2017xxxx+
blastplus utilities (blastx) 2.7.1+
a local copy of the swissprot database (the .phr, .pin, .pnd, .pni ... files)

The relevant TransDecoder scripts are included with their license in 01_scripts/TransDecoder.

Running the pipeline

For each new project, get a new copy of the project's repository from the sources listed in the Installation section and copy your data in the 03_data folder.

Install dependencies
Download GAWN (see Installation section above)
Put your genome and transcriptome fasta files (uncompressed) in 03_data
Make a copy of 02_info/gawn_config.sh and edit the parameters
Run the following command:

./gawn 02_info/MY_CONFIG_FILE.sh

Results

Once the pipeline has completed, all result files are found in the 05_results folder.

A valid gff3 annotation file
A transcriptome annotation .tsv table
A genome annotation .tsv table

Test dataset

TODO

WARNING: Not yet available

A test dataset is available as a sister repository on GitHub.

Download the repository and then move the data in GAWN's 03_data folder. Follow the normal pipeline procedure to analyse this small dataset. It should run in one to ten minutes depending on your computer.

License

CC share-alike

GAWN by Eric Normandeau is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at https://github.com/enormandeau/gawn.

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
00_archive/log_files		00_archive/log_files
01_scripts		01_scripts
02_infos		02_infos
03_data		03_data
04_annotation		04_annotation
05_results		05_results
99_logs		99_logs
.gitignore		.gitignore
README.md		README.md
TODO.md		TODO.md
gawn		gawn

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GAWN v0.3.2

WARNING!

Genome Annotation Without Nightmares

Description

Use cases

Overview of the analyses

Resources needed

Installation

Dependencies

Running the pipeline

Results

Test dataset

TODO

WARNING: Not yet available

License

About

Releases

Packages

Languages

sanvva/gawn

Folders and files

Latest commit

History

Repository files navigation

GAWN v0.3.2

WARNING!

Genome Annotation Without Nightmares

Description

Use cases

Overview of the analyses

Resources needed

Installation

Dependencies

Running the pipeline

Results

Test dataset

TODO

WARNING: Not yet available

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages