Computational Genomics Gene prediction pipeline

This pipeline is designed by team2-group1, to predict genes of the samples from team1 using a number of Gene prediction tools. This pipeline is used to generate a merged result from several tools.

Gene Prediction Pipeline

Requirements

python3
Latest Perl
bedtools
samtools
Latest Prodigal
Latest GeneMarkS-2
Latest Aragorn
Latest Barrnap
Latest Biopython (If running Bedtools)

bedtools and samtools are required for the union of GeneMarkS-2 and Prodigal results
All required tools need to be installed properly and added to $PATH

Quick Start

-f :Path to file input directory (Required)
-p :Run Prodigal prokaryotic mRNA gene prediction tool
-g :Run GeneMarkS-2 prokaryotic mRNA gene prediction tool
-nc :Run Aragorn and Barrnap to predict tRNA/tmRNA and rRNA (respectively) (optional)
-ncs :Separate Aragorn and Barrnap results into two distinct sets of nucleotide fasta files

Default behavior will still require -f and will run both Prodigal and GeneMarkS-2 with Bedtools
bedtools will run if both Prodigal and GeneMarkS-2 are run, and includes a union folder of both tools
Example usage: ./geneprediction_pipeline_t1.py -f <input_dir>

Output Description

Prodigal and GeneMarkS-2 run individually will be found in their respective folders, ./prodigalresults or ./gms2results
Output files are split into three folders. One for GFF format, fna and faa.
If Prodigal and GeneMarkS-2 are run in tandem, then the combined output will also be in ./prodigal-genemark

Aragorn and Barrnap results are joined by default into single .fna files by assembly, located in ./arabarr

Nucleotide and Amino acid fasta formats may be used with BLAST homology validation as described below.

Blast ( For validation )

Requirements

Version_5 database (required)
taxonomic_id list (required)
EDirect command-line utility (required)
Latest Perl (required)
python3 (required)
blast+ (required)

The validation part assume that all the requirements are installed and the tools should be added to $PATH.

Quick start

For downloading database:
Use ./update_blastdb.pl --blastdb_version 5 --showall to see the option.
Use ./update_blastdb.pl --blastdb_version 5 [Database] --decompress to download.

For getting the taxonomy_idlist:
Use get_species_taxids.sh -n [organism]

For blastp (amino acid):
./blastp.py -d [queried_fold] -t [taxonomy_idlist] -o [outputfolder]
or blastx (DNA seqs):
./blastx.py -d [queried_fold] -t [taxonomy_idlist] -o [outputfolder]

Description of argument

For blastp.py or blastx.py:
-d :the folder that contains only fasta files you want to validate.
-t :the taxonomy_idlist for specific organism.
-o :the output folder for your outputs.

For validationP.py or validationX.py:
-s :the folder that contains only fasta files you want to validate.
-b :the folder that contains only blast results for your fasta files.
-o :the output folder for your outputs.

Output Description

There will be two folders in your output folder:
knownprotein/ : The fasta files in this folder have got rid of the sequences that do not have hit in blast.
novelgene/ : The fasta files in this folder do not have hit in blast.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.gitignore		.gitignore
README.md		README.md
blastp.py		blastp.py
blastx.py		blastx.py
cnv_aragorn2gff.pl		cnv_aragorn2gff.pl
geneprediction_pipeline_t1.py		geneprediction_pipeline_t1.py
geneprediction_pres.pptx		geneprediction_pres.pptx
nucltoprotein.py		nucltoprotein.py
validationP.py		validationP.py
validationX.py		validationX.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Computational Genomics Gene prediction pipeline

Gene Prediction Pipeline

Requirements

Quick Start

Output Description

Blast ( For validation )

Requirements

Quick start

Description of argument

Output Description

About

Releases

Packages

Contributors 2

Languages

compgenomics2019/Team1-GenePrediction

Folders and files

Latest commit

History

Repository files navigation

Computational Genomics Gene prediction pipeline

Gene Prediction Pipeline

Requirements

Quick Start

Output Description

Blast ( For validation )

Requirements

Quick start

Description of argument

Output Description

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages