Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RepeatModeler or RepeatMasker failed with EarlGrey v4.4.0 and RepeatMasker v4.15, lanscape.pdf missed and an extended runtime of RepeatModeler round with EarlGrey v4.4.0 and RepeatMasker v4.15 #140

Open
manwensu opened this issue Sep 20, 2024 · 4 comments

Comments

@manwensu
Copy link

  1. The first time, I installed EarlGrey v4.4.0 and RepeatMasker v4.16, and I ran around 50 species using this version. The running time of every species was less than 3 days, and those result files were normal. Bombus.affinis as an example.

1726826643252
1726826643222

  1. The second time, I changed the version of RepeatMasker due to some reasons, and the version was EarlGrey v4.4.0 and RepeatMasker v4.15. I reran all above species, the running time of some species was more than 15 days and still unfinished, the problem of which was RepeatModeler or RepeatMasker failed.

1726826643240
1726826643228

  1. The third time, I changed the version of EarlGrey, and the version was EarlGrey v4.2.4 and RepeatMasker v4.15. I ran three species, which had an error of RepeatModeler or RepeatMasker failed from the results in the second time. The running time was normal, 2-3 days, but all three landscape.pdf files in summaryFiles folder were missed. There had an error in Bombus.bicoloratus_RepeatLandscape folder, in which existed many temporary files instead of a needed file Bombus.bicoloratus_RepeatLandscape/Bombus.bicoloratus.filteredRepeats.withDivergence.gff.

1726826643252
1726826643234
1726834377453
Screenshot from 2024-09-16 14-04-20

  1. The fourth time, I reran two of three species in the third time with the same version of EarlGrey and RepeatMasker, and I found an extended running time in RepeatModeler Round 2-5.

Screenshot from 2024-09-20 14-39-49

What can I do?
Thank you very much!

@jamesdgalbraith
Copy link
Collaborator

Hi,

I think there's a variety of issues going on but the main underlying one is with RepeatModeler and/or something unique within the genomes themselves. Just an initial few questions, are you running the jobs with the same number of cores and memory available for all the jobs? Are these the same genomes which are causing issues here: #135 ? And do you know what version of RepeatModeler is running? (RepeatModeler is the package causing the issue rather than RepeatMasker)

Assuming the version of RepeatModeler is consistent between all three runs and the number of cores and memory available is also consitent my hypothesis is by chance the seeds RepeatModeler has chosen in the two later runs for the the underlying RepeatScout and RECON packages are sampling regions of the genome containing repeats (likely satellites) which the underlying algorithm(s) struggles to create consensuses sequence for. The fact that these issues are occuring in different species of Bombus make me suspicious that there is something odd/interesting going on within these bumblebee genomes that's causing the issue. What I recommend doing as a trial is running the same version RepeatModeler by itself on the genome(s) outside of Earl Grey using the same command RepeatModeler -engine ncbi -threads ${NUM_THREADS} -database ${DATABASE} to see if the issue occurs here too.

The issue you outline in point three above appears to be a problem with the post-processing script which calculates the Kimura distance between the repeats in genome and the consensus sequences used to find them. It appears that script crashed without completing the calculations as it should deleted those temporary files upon completion. There were some issues with it early on which we've patched now. Seeing as you still have the "filteredRepeats.gff" and the TE library you can use these scripts here to calculate the divergence and create the plots: https://github.com/jamesdgalbraith/EarlGreyDivergenceCalc (for the Rscript make sure use the --axis_flip flag to have the plots be the same as typical EarlGrey plots)

@manwensu
Copy link
Author

Hi,

I think there's a variety of issues going on but the main underlying one is with RepeatModeler and/or something unique within the genomes themselves. Just an initial few questions, are you running the jobs with the same number of cores and memory available for all the jobs? Are these the same genomes which are causing issues here: #135 ? And do you know what version of RepeatModeler is running? (RepeatModeler is the package causing the issue rather than RepeatMasker)

Assuming the version of RepeatModeler is consistent between all three runs and the number of cores and memory available is also consitent my hypothesis is by chance the seeds RepeatModeler has chosen in the two later runs for the the underlying RepeatScout and RECON packages are sampling regions of the genome containing repeats (likely satellites) which the underlying algorithm(s) struggles to create consensuses sequence for. The fact that these issues are occuring in different species of Bombus make me suspicious that there is something odd/interesting going on within these bumblebee genomes that's causing the issue. What I recommend doing as a trial is running the same version RepeatModeler by itself on the genome(s) outside of Earl Grey using the same command RepeatModeler -engine ncbi -threads ${NUM_THREADS} -database ${DATABASE} to see if the issue occurs here too.

The issue you outline in point three above appears to be a problem with the post-processing script which calculates the Kimura distance between the repeats in genome and the consensus sequences used to find them. It appears that script crashed without completing the calculations as it should deleted those temporary files upon completion. There were some issues with it early on which we've patched now. Seeing as you still have the "filteredRepeats.gff" and the TE library you can use these scripts here to calculate the divergence and create the plots: https://github.com/jamesdgalbraith/EarlGreyDivergenceCalc (for the Rscript make sure use the --axis_flip flag to have the plots be the same as typical EarlGrey plots)

Thank you very much! james. I will try to do that following you said.

@TobyBaril
Copy link
Owner

Hi @manwensu, any updates on the RepeatModeler run? Some of the solutions suggested in #145 might work for future runs, and at least prevent the eternal elongation of strange low complexity repeats!

@manwensu
Copy link
Author

manwensu commented Oct 2, 2024

Hi @manwensu, any updates on the RepeatModeler run? Some of the solutions suggested in #145 might work for future runs, and at least prevent the eternal elongation of strange low complexity repeats!

Hi Tobias, I ran RepeatModeler v2.0.1 with previously failed species Bombus.dahlbomii in the singularity. It worked well and didn't have a long runtime. I am trying to run RepeatModeler v2.0.5 separately. Many thanks for your suggestions, I will try it later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants