Skip to content

kavonrtep/repeat_annotation_pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RepeatExplorer based Assembly Annotation Pipeline

Tools in repository

Extract Repeat Library from RepeatExplorer Archive

(extract_re_contigs.xml)

This toll will extract library of repeats based on RepeatExplorer2 analysis. Library is available as fasta file. Tool also filter out all the contig parts which has read depth and length below threshold. Parts of contigs with read depth below threshold are hardmasker. Contigs with full hardmasking are removed completelly

Format repeat library

(format_repeat_library.xml)

This tool append classification of repeats to library of repeats. Type of repeat is then part of sequence name in format:

>sequence_id#classification_level1/classification_level2/... this enable to specify classification hierarchy Classification of sequneces in library is provided using CLUSTER_TABLE.csv (part of RE2 output)

This file can then be used for annotation of repeat in your assembly:

Repeat Annotation

(repeat_annotate_custom.xml)

Internally annotation is performed using RepeatMasker search. Output from RepeatMasker is parsed to remove duplicated and overlaping annotations, Conflicts in annotations are resolved using hierarchical classification of repeats provided in custom database.

Summarize Annotation

This tool will create summary table from GFF annotation.

test data

  • test_assembly_1.fasta with test_db_1_satellites.fasta (include CLASS followed by double underscore - syntax 1)
  • test_assembly_2.fasta with test_db_2_RE_repeats.fasta (include full hierarchical classification)
  • TODO RM take only short name of sequences - validate name / adjust

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published