-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RepeatMasker complains about invalid species search term (but runs at Gitpod) #136
Comments
Hi, This is a strange case, as an invalid search term will kill a RepeatMasker job before commencing with analysing contigs. In this case, the final line of the log on your server is In this case, there are also only 9 ancestral TE families, so it might be beneficial to ignore the initial repeatmasker step to provide better information for the de novo annotation, which in turn can generate better and more representative consensus sequences, with better divergence estimates (as the consensus sequences represent the TEs present in the genome analysed, rather than similar families from other species, which might make some TEs look older than they actually are). To start, I would recommend comparing the Cheers, Toby P.S The gitpod information can be found in the |
Hi! In addition to the above, there can sometimes be issues on systems where conda doesn't play nicely with other system installations of RepeatMasker. When the conda envrionment is active, I would recommend checking which version/installation of RepeatMasker is being called in case there are also other installations that are interfering with the conda one. |
Hi Toby. Thank you very much for the help. I checked and the file genome.fasta.prep is identical to genome.fasta, as well at my server and at Gitpod (except for chromosome names, as you said, since they were changed to ctg_1 to ctg_99). Since Trypanosoma cruzi is a non-model organism, I got surprised when I discovered that RepeatMasker recognized the NCBI TaxID 5693. That is why I would like to test Earl Grey with the option -r (besides running with default parameters and comparing the results). By the way, I liked very much of your observation about the number of ancestral TE families. Since I am not used to this kind of pipeline, I had no idea that 9 should be a small number. I know that the documentation of Earl Grey does not intend to explain the basics of TE, but maybe an observation relative to this number at the explanation of option -r would help inexperienced users, like me. Sorry for not seeing the Gitpod information at the Github README file. The installation if very well documented! I tried some other things to solve the problem of the option -r:
I am trying to contact the sysadmin again to see if my commands are identical to his. Maybe I should clone his miniconda environment, but then we will continue ignoring what is causing the error. I also checked that our server has RepeatMasker installed, but it is version 4.1.6, different of the version indicated at the log that I copied in the issue: 4.1.5. Anyway, the sysadmin managed to run Earl Grey even with this system installation of RepeatMasker, so I have no idea what is the problem and what else should I try. :-( Tell me if you have any other idea. I am willing to solve this problem and execute Earl Grey with the option -r at our server instead of Gitpod server. One last thing that I noticed: I tried the new 4.4.5 version, but if I run just the command earlGrey, without any option, the output reports 4.4.4 instead of 4.4.5. I am very grateful for all your help. Thanks again. -- |
Hi David, This is really strange, but does point to the issue being something to do with your specific conda configuration. As the issue is with RepeatMasker, I'm wondering whether this could be linked to conflicting perl installations... One potential way around this is to alter which perl installation is called by RepeatMasker, which does involve reconfiguring RepeatMasker within the conda environment to use the correct perl installation. |
Hello Toby, I checked the Perl installation as you asked and it seems that this is not the problem yet:
As you can see, base's and earlGrey's perl versions are different, as expected. The path seems to be OK for me. I also compared the list of installed packages at Gitpod (after creating a new workspace with Earl Grey 4.4.5) and at my server (after updating it with I am still waiting for an answer from the sysadmin to see what he made different from me and I will let you informed as soon as I get any progress. Thank you for all the help. -- |
Hi Tobby. I'm still not able to say what is the problem with Earl Grey when I try to run it with the option -r at our server. I copied configuration files such as .bashrc and .profile from the sysadmin user, overwriting my owns but the problem persists. I removed my Anaconda installation and I am using miniconda now, but nothing changes. In another attempt, I tried to reinstall Repeat Masker and it seems to have a problem with the library Dfam: $ mamba install --force-reinstall repeatmasker
Looking for: ['repeatmasker']
pkgs/r/linux-64 No change
pkgs/main/noarch No change
pkgs/r/noarch No change
pkgs/main/linux-64 6.6MB @ 11.2MB/s 0.6s
bioconda/noarch 4.4MB @ 5.9MB/s 0.6s
bioconda/linux-64 4.7MB @ 4.6MB/s 0.9s
conda-forge/noarch 16.9MB @ 11.8MB/s 1.4s
conda-forge/linux-64 39.0MB @ 23.3MB/s 1.7s
Pinned packages:
- python 3.9.*
warning libmamba Invalid package cache, file '/home/pires/local/src/miniconda3/2024-09-17/pkgs/repeatmasker-4.1.5-pl5321hdfd78af_1/share/RepeatMasker/Libraries/Dfam.h5' has incorrect size
Transaction
Prefix: /home/pires/local/src/miniconda3/2024-09-17/envs/earlgrey
Updating specs:
- repeatmasker
Package Version Build Channel Size
──────────────────────────────────────────────────────────────────
Reinstall:
──────────────────────────────────────────────────────────────────
o repeatmasker 4.1.5 pl5321hdfd78af_1 bioconda Cached
Summary:
Reinstall: 1 packages
Total download: 0 B
──────────────────────────────────────────────────────────────────
Confirm changes: [Y/n]
Downloading and Extracting Packages:
Preparing transaction: done
Verifying transaction: /
SafetyError: The package for repeatmasker located at /home/pires/local/src/miniconda3/2024-09-17/pkgs/repeatmasker-4.1.5-pl5321hdfd78af_1
appears to be corrupted. The path 'share/RepeatMasker/Libraries/Dfam.h5'
has an incorrect size.
reported size: 68 bytes
actual size: 5861387976 bytes
done
Executing transaction: done The interesting thing is that after the forced reinstalation, the same problem persists (the same message is shown if I execute the reinstall command again). As a last attempt, I will ask for sysadmin to remove the Repeat Masker installation from the server, to test if there is some kind of conflicting versions, but I have no idea what else I could try if this doesn't work. :-( I'll keep you updated of the ongoing tests. P.S.: Now I am trying the new version v5.0.0. I would like to note two typos:
Thank you very much for the great software. Best regards. -- |
Hi TobyBaril.
I would like to ask for you help concerning a RepeatMasker error.
We are trying to annotate a new genome assembly of Trypanosoma cruzi, whose NCBI Taxid is 5693. Earl Grey was installed through conda at our Linux server. When I execute the command:
earlGrey -g input/genome.fasta -s tryCru-Dm28c -t 88 -r 5693 -o output
I get the following output:
================================================================================================
...
<<< Running Initial Mask with Known Repeats >>>
RepeatMasker version 4.1.5
Search Engine: NCBI/RMBLAST [ 2.14.1+ ]
Using Master RepeatMasker Database: /home/pires/local/src/anaconda/3-2024.06-1/envs/earlGrey/share/RepeatMasker/Libraries/RepeatMaskerLib.h5
Title : Dfam
Version : 3.7
Date : 2023-01-11
Families : 19,768
Species/Taxa Search:
Trypanosoma cruzi [NCBI Taxonomy ID: 5693]
Lineage: root;cellular organisms;Eukaryota;Discoba;Euglenozoa;
Kinetoplastea;Metakinetoplastina;Trypanosomatida
9 families in ancestor taxa; 0 lineage-specific families
analyzing file /storage/zuleika/volume3/project/jcunha/hiChromatin/project/tryCru-Dm28c2018-lcc2024/genomeAnnotation/0-transposableElements/earlGrey-repeatMaskerSearchTerm/input/genome.fasta.prep
Checking for E. coli insertion elements
Checking for E. coli insertion elements
identifying Simple Repeats in batch 2 of 548
identifying matches to 5693 sequences in batch 2 of 548
identifying Simple Repeats in batch 1 of 548
identifying matches to 5693 sequences in batch 1 of 548
identifying Simple Repeats in batch 2 of 548
identifying Simple Repeats in batch 1 of 548
...
Checking for E. coli insertion elements
identifying Simple Repeats in batch 547 of 548
identifying Simple Repeats in batch 548 of 548
identifying matches to 5693 sequences in batch 548 of 548
identifying Simple Repeats in batch 548 of 548
No repetitive sequences were detected in /storage/zuleika/volume3/project/jcunha/hiChromatin/project/tryCru-Dm28c2018-lcc2024/genomeAnnotation/0-transposableElements/earlGrey-repeatMaskerSearchTerm/input/genome.fasta.prep
ERROR: RepeatMasker failed, please check logs. This is likely because of an invalid species search term, if issue persists please use NCBI Taxids (E.G Drosophila is replaced with 7125)
================================================================================================
The curious thing is that, while I was searching for a solution, I reached the following page:
https://tehub.org/tutorials/docs/earlgrey
which recommends running Earl Grey from Gitpod (by the way, this is a great alternative to run Earl Grey that is not documented here at Github). If I try to run the same command at Gitpod, RepeatMasker run is ok and this error doesn't occur:
================================================================================================
Checking for E. coli insertion elements
identifying Simple Repeats in batch 548 of 548
identifying matches to 5693 sequences in batch 548 of 548
identifying Simple Repeats in batch 548 of 548
processing output:
cycle 1 .....................................
cycle 2 .....................................
cycle 3 ...................................
cycle 4 ...................................
cycle 5
cycle 6 ...................................
cycle 7 ...................................
cycle 8 .................................
cycle 9 .................................
cycle 10 .................................
Generating output... ................................
masking
done
================================================================================================
Can you help me to figure out what is the problem with the RepeatMasker installed at our conda?
Thanks in advance.
--
David da Silva Pires
The text was updated successfully, but these errors were encountered: