Question about GALBA performance vs. BRAKER3 #56

aleponce4 · 2024-09-06T13:57:28Z

Hi,
First, apologies if this is not the best place to ask this question. I'm doing gene annotation for a non-model rodent, and I have tried several approaches:

BRAKER2: with a large protein database,
GALBA: using combined protein data from a few closely related species,
BRAKER3: incorporating both the above protein data and total RNA-Seq data alignments.

I obtained considerably better results using GALBA than BRAKER3. While this is great news, I am surprised that GALBA outperformed BRAKER3 given that BRAKER3 had the additional RNA-Seq data.

Is this within expectations for these tools, or could this indicate an issue with my BRAKER3 run? (For example, could there have been a problem with the RNA-Seq alignments?) I'm mostly basing my assessment of "better" on the results I obtained using OMARK for each annotation.

*I attached a plot of my OMARK results. The braker3 bar was run using RNA data from just 1 sample, while braker3+ used all the RNA data I had available.

Thanks in advance for any insights!

KatharinaHoff · 2024-09-06T14:18:01Z

BRAKER2 is expected to perform poorly on a mammal, it's not made for large genomes.

BRAKER3 should be able to handle a vertebrate genome, however, we did not design it to work with mammals, specifically. For annotating a mammal, we usually do not train AUGUSTUS at all, we use the human parameter set. For a mammal, you could also consider Tiberius (https://www.biorxiv.org/content/10.1101/2024.07.21.604459v1.full.pdf) but it does not use the RNA-Seq data, and it only predicts one isoform per locus. In that category, it may outperform Galba, though.

What you see for BRAKER3 is the result of a too stringent filtering. Too many gene models without evidence were discarded from the total gene set. The models that you have are very good, but stuff is missing. You may be able to manually restore that.

Can you send me that part of the braker.log where it ran best_by_compleasm.py ? Also, the log file from the subfolder of compleasm may be useful. (However, if you add more genes, the unknown cosistency and fragments etc. may increase just to the level of the galba run, not sure whether it's even worth trying.)

For mammals, you have excellent reference protein donors, so it is not surprising that Galba did well. Also, I added that DIAMOND filter to discard false positives, a while ago. That also helps denoising, and it only works well in your current data situation.

aleponce4 · 2024-09-06T15:08:30Z

I appreciate the explanation!
I’ll definitely give Tiberius a try when I can (though the GPU partitions at my university are usually pretty busy, so it might take a bit of time). Based on your experience and the OMAR results I've gotten so far, it sounds like GALBA is a good fit for my current project. I just wanted to make sure the behavior I'm seeing comparing with BRAKER3 wasn't unusual.

As for compleasm, I had some issues with it, so I had to run BRAKER3 without the --busco_lineage option. I saw a similar issue mentioned in Compleasm_to_hints error #752, but I haven’t had a chance to build a new Singularity container with the updated TSEBRA yet. I'm still climbing the learning curve.

Thanks again!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about GALBA performance vs. BRAKER3 #56

Question about GALBA performance vs. BRAKER3 #56

aleponce4 commented Sep 6, 2024

KatharinaHoff commented Sep 6, 2024

aleponce4 commented Sep 6, 2024

Question about GALBA performance vs. BRAKER3 #56

Question about GALBA performance vs. BRAKER3 #56

Comments

aleponce4 commented Sep 6, 2024

KatharinaHoff commented Sep 6, 2024

aleponce4 commented Sep 6, 2024