Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Alignment of RNA reads to the genome with hisat
Let's load necessary modules
module load STAR
module load hisat2
module load BRAKER/2.1.6
module load samtools
Actual alignment of RNA reads to the genome
We include option --dta in order to report alignments tailored for transcript assemblers
srun hisat2-build genome_clean_sorted.fasta genome_clean_sorted_hisat2
We need to softmasked the repetitive elements in the genome
First is to identify repeats de novo from your reference genome using RepeatModeler
module load RepeatModeler
module load RepeatMasker
We then run genome annotation using BRAKER
This is detailed in this link (https://github.com/Gaius-Augustus/BRAKER)
module load BRAKER/2.1.6
srun braker.pl --genome=genome_clean_sorted.fasta.masked --bam=60541_hisatr_alignment_sorted.bam,60542_hisatr_alignment_sorted.bam,60543_hisatr_alignment_sorted.bam,60544_hisatr_alignment_sorted.bam,60545_hisatr_alignment_sorted.bam,60546_hisatr_alignment_sorted.bam,60547_hisatr_alignment_sorted.bam,60548_hisatr_alignment_sorted.bam -gff3 --useexisting --species=Artemia_francisc --cores=30 --min_contig=5000 --softmasking --workingdir=/nfs/scistore18/vicosgrp/vbett/Artemia_franEMdata/EMReads_analysis/Expression_dir/braker_output
We can also run genome annotation with RNA bam and protein sequences and compare the results
First we download all arthropoda protein sequences
wget https://v100.orthodb.org/download/odb10_arthropoda_fasta.tar.gz
tar xvfz odb10_arthropoda_fasta.tar.gz
cat arthropoda/Rawdata/* > proteins.fasta
We then generate a protein hint that will be used in BRAKER
Let's load necessary modules
prothint.py genome_clean_sorted_masked.fasta proteins.fasta --workdir ProhintDir --threads 50
This produces 3 output files
An output which is ready to be used in BRAKER and AUGUSTUS is also generated:
This is detailed in this link (https://github.com/gatech-genemark/ProtHint#protein-database-preparation)
We then run braker using both RNA and protein hint
srun braker.pl --genome=genome_clean_sorted.fasta.masked --bam=60541_hisatr_alignment_sorted.bam,60542_hisatr_alignment_sorted.bam,60543_hisatr_alignment_sorted.bam,60544_hisatr_alignment_sorted.bam,60545_hisatr_alignment_sorted.bam,60546_hisatr_alignment_sorted.bam,60547_hisatr_alignment_sorted.bam,60548_hisatr_alignment_sorted.bam -gff3 --useexisting --species=Artemia_francisca --etpmode --cores=30 --min_contig=5000 --softmasking --hints=prothint_augustus.gff --workingdir=/nfs/scistore18/vicosgrp/vbett/Artemia_franEMdata/EMReads_analysis/Expression_all/braker_outputetp