-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about GALBA performance vs. BRAKER3 #56
Comments
BRAKER2 is expected to perform poorly on a mammal, it's not made for large genomes. BRAKER3 should be able to handle a vertebrate genome, however, we did not design it to work with mammals, specifically. For annotating a mammal, we usually do not train AUGUSTUS at all, we use the human parameter set. For a mammal, you could also consider Tiberius (https://www.biorxiv.org/content/10.1101/2024.07.21.604459v1.full.pdf) but it does not use the RNA-Seq data, and it only predicts one isoform per locus. In that category, it may outperform Galba, though. What you see for BRAKER3 is the result of a too stringent filtering. Too many gene models without evidence were discarded from the total gene set. The models that you have are very good, but stuff is missing. You may be able to manually restore that. Can you send me that part of the braker.log where it ran best_by_compleasm.py ? Also, the log file from the subfolder of compleasm may be useful. (However, if you add more genes, the unknown cosistency and fragments etc. may increase just to the level of the galba run, not sure whether it's even worth trying.) For mammals, you have excellent reference protein donors, so it is not surprising that Galba did well. Also, I added that DIAMOND filter to discard false positives, a while ago. That also helps denoising, and it only works well in your current data situation. |
I appreciate the explanation! As for compleasm, I had some issues with it, so I had to run BRAKER3 without the --busco_lineage option. I saw a similar issue mentioned in Compleasm_to_hints error #752, but I haven’t had a chance to build a new Singularity container with the updated TSEBRA yet. I'm still climbing the learning curve. Thanks again! |
Hi,
First, apologies if this is not the best place to ask this question. I'm doing gene annotation for a non-model rodent, and I have tried several approaches:
I obtained considerably better results using GALBA than BRAKER3. While this is great news, I am surprised that GALBA outperformed BRAKER3 given that BRAKER3 had the additional RNA-Seq data.
Is this within expectations for these tools, or could this indicate an issue with my BRAKER3 run? (For example, could there have been a problem with the RNA-Seq alignments?) I'm mostly basing my assessment of "better" on the results I obtained using OMARK for each annotation.
*I attached a plot of my OMARK results. The braker3 bar was run using RNA data from just 1 sample, while braker3+ used all the RNA data I had available.
Thanks in advance for any insights!
The text was updated successfully, but these errors were encountered: