How to merge the TEsorter repeat libraires #52

manoharbisht1998 · 2024-01-15T06:16:52Z

Hey, thanks for the tool. How can I merge the output library of TEsorter with the repeatModeler repeat library to run RepeatMasker? Further, can I directly input the output library of TEsorter in RepeatMasker?

zhangrengang · 2024-01-15T06:41:49Z

Yes. In the output library *.cls.lib, the sequences are identical to the input, but their ID have been updated with new classifications.

manoharbisht1998 · 2024-01-15T07:17:08Z

Okay, thanks for answering the second part of my question. But I still have doubt about merging the two libraries. As the RepeatModeler provides the consensus library where the number of sequences is very less as compared to input genome fasta whereas, the TEsorter provides the number of sequences same as the input genome fasta. So I am wondering that, can I merge both the librarires in one and then run clustered the merged library using tools like CD-Hit?

zhangrengang · 2024-01-15T07:43:13Z

I do not understand. Are you using -genome option to screen a whole genome with TEsorter? Otherwise, you should not input genome fasta, but input TE fasta identified by e.g. RepeatModeler.

manoharbisht1998 · 2024-01-15T08:39:13Z

Thank you for the prompt reply. Yes, I used the -genome option to screen for the TEs in my genome. However, I was not aware that we can also input the library obtained from RepeatModeler.
Now, I will run the TEsorter with the repeat library obtained from RepeatModeler and with -db rexdb-plant (as my species is a plant). Then the result that I will get can be fed downstream to RepeatMasker. Please correct me if I miss anything.

zhangrengang · 2024-01-15T09:04:19Z

You are right. Please note that the -genome option do not produce a TE library like RepeatModeler, but output annotations (*.dom.gff3) and sequences (*.dom.faa) of TE protein domains across the whole genome.

manoharbisht1998 · 2024-01-15T09:10:11Z

Okay. I am using the TEsorter v1.4.6, and I did get the *.cls.lib by using the -genome option.

zhangrengang · 2024-01-15T09:20:14Z

It is strange. How did you install it? Is it the last version from github?

manoharbisht1998 · 2024-01-15T09:21:57Z

I installed with conda environment

zhangrengang · 2024-01-15T09:55:37Z

I test the conda version, but only four files output:

$ TEsorter -genome rice6.9.5.liban -fw
$ ls
rice6.9.5.liban.rexdb.domtbl
rice6.9.5.liban.rexdb.dom.gff3
rice6.9.5.liban.rexdb.dom.faa
rice6.9.5.liban.rexdb.dom.tsv

manoharbisht1998 · 2024-01-15T09:59:04Z

Oh, it must be because I did not define my genome by parameter -genome instead I used something.
TEsorter my_genome.fa -p 50 -prob 0.9
Which means TEsorter by default took it as a repeat library, I guesss.

zhangrengang · 2024-01-15T10:00:13Z

Yes.

manoharbisht1998 · 2024-01-15T10:04:04Z

Thank you for the prompt reply. Yes, I used the -genome option to screen for the TEs in my genome. However, I was not aware that we can also input the library obtained from RepeatModeler. Now, I will run the TEsorter with the repeat library obtained from RepeatModeler and with -db rexdb-plant (as my species is a plant). Then the result that I will get can be fed downstream to RepeatMasker. Please correct me if I miss anything.

Further, on this.. I run TEsorter with the RepeatModeler output consesi.fa and it took only one minute to give me the output in *.cl.lib, with the following output on screen
Order Superfamily # of Sequences# of Clade Sequences # of Clades# of full Domains
LTR Copia 75 72 8 3
LTR Gypsy 108 80 6 20
pararetrovirus unknown 7 0 0 0
LINE unknown 22 0 0 0
TIR EnSpm_CACTA 4 0 0 0
TIR MuDR_Mutator 6 0 0 0
TIR PIF_Harbinger 5 0 0 0
TIR hAT 5 0 0 0

Now I am wondering does the pipeline worked or not?

zhangrengang · 2024-01-15T10:27:43Z

It works. It is fast for small TE library.

manoharbisht1998 · 2024-01-15T13:19:11Z

Hi, I have run the RepeatMasker, and I am getting more repeats classified as "unknown" which I want to reduce. I am attaching the output of repeatMasker for my genome both using RepatModeler ---> RepeatMasker and RepeatModeler ---> TEsorter --->RepeatMasker. Do you have any suggestions on how can I reduce the number of "unknown" TEs? Further, I am also attaching the headers of the file .*cls.lib which I obtained after running TEsorter and input in RepeatMasker.

1_Unknown#Unknown 1_Unknown ( RepeatScout Family Size = 4356, Final Multiple Alignment Size = 100, Localized to 2506 out of 2617 contigs )
AAATATGAAATAAATAAAAATAATACATGGAAATGGAAAATACNGATTATTTAATTANTA

zhangrengang · 2024-01-15T13:29:35Z

You may use the union set of non-unknown TEs from RepatModeler and TEsorter.

manoharbisht1998 · 2024-01-15T16:23:57Z

I could not get you! are you suggesting to take only those sequences that are annotated by both RepeatModeler and TEsorter output (which we obtain after running with RepeatModeler library)?

zhangrengang · 2024-01-15T23:07:51Z

I mean you may replace the unknown classifications by TEsorter with the known classifications by RepeatModeler, like:

less rice6.9.5.liban.rexdb.cls.lib | awk '{if ($1~"#Unknown"){cls=$1; $1=">"$2; $2=cls}{print}}'

It is just to reduce the number of "unknown" TEs.

manoharbisht1998 · 2024-01-16T07:08:25Z

Okay, Thanks!

manoharbisht1998 closed this as completed Jan 15, 2024

manoharbisht1998 reopened this Jan 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to merge the TEsorter repeat libraires #52

How to merge the TEsorter repeat libraires #52

manoharbisht1998 commented Jan 15, 2024 •

edited

Loading

zhangrengang commented Jan 15, 2024

manoharbisht1998 commented Jan 15, 2024

zhangrengang commented Jan 15, 2024

manoharbisht1998 commented Jan 15, 2024

zhangrengang commented Jan 15, 2024

manoharbisht1998 commented Jan 15, 2024

zhangrengang commented Jan 15, 2024

manoharbisht1998 commented Jan 15, 2024

zhangrengang commented Jan 15, 2024

manoharbisht1998 commented Jan 15, 2024 •

edited

Loading

zhangrengang commented Jan 15, 2024

manoharbisht1998 commented Jan 15, 2024

zhangrengang commented Jan 15, 2024

manoharbisht1998 commented Jan 15, 2024 •

edited

Loading

zhangrengang commented Jan 15, 2024

manoharbisht1998 commented Jan 15, 2024

zhangrengang commented Jan 15, 2024

manoharbisht1998 commented Jan 16, 2024

How to merge the TEsorter repeat libraires #52

How to merge the TEsorter repeat libraires #52

Comments

manoharbisht1998 commented Jan 15, 2024 • edited Loading

zhangrengang commented Jan 15, 2024

manoharbisht1998 commented Jan 15, 2024

zhangrengang commented Jan 15, 2024

manoharbisht1998 commented Jan 15, 2024

zhangrengang commented Jan 15, 2024

manoharbisht1998 commented Jan 15, 2024

zhangrengang commented Jan 15, 2024

manoharbisht1998 commented Jan 15, 2024

zhangrengang commented Jan 15, 2024

manoharbisht1998 commented Jan 15, 2024 • edited Loading

zhangrengang commented Jan 15, 2024

manoharbisht1998 commented Jan 15, 2024

zhangrengang commented Jan 15, 2024

manoharbisht1998 commented Jan 15, 2024 • edited Loading

zhangrengang commented Jan 15, 2024

manoharbisht1998 commented Jan 15, 2024

zhangrengang commented Jan 15, 2024

manoharbisht1998 commented Jan 16, 2024

manoharbisht1998 commented Jan 15, 2024 •

edited

Loading

manoharbisht1998 commented Jan 15, 2024 •

edited

Loading

manoharbisht1998 commented Jan 15, 2024 •

edited

Loading