-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unusual abundance results with custom classifier index #252
Comments
This is indeed very strange. Are many reads assigned to the ancestors of 52461? |
Yeah, at genus level numReads is 2301874 and the numUniqueReads is 2242384 |
I think when computing the abundance, Centrifuge will trickle down the abundance from ancestor taxonomy to the leaves, so that's why this species gets a very high abundance. |
Hi, I also had an issue with the abundance calculation, except in my case I had 710 unique reads out of 710 total reads (numreads) assigned to a taxa and relative abundance is 0.0 .
In the same run and report file I have another taxa that had 0 unique reads, but was multi mapped to 5 times and that taxa did get a relative abundance.
Appreciate any advice or guidance. Thanks. |
@pjtorres I think this is more like a rounding error in EM algorithm. I recently also noticed that some strains may get very high abundance due to their short genome sizes. Is this your case? I'm thinking about adding a parameter to ignore the short ones. This might be related to the issue you just opened. |
@mourisl thank you for your quick response! I agree that it is some issue with the EM part. I also think it might have to do with genome length. However; in the specific example above you can see that both strains are about the same size 807 vs 1080 and the read with no unique reads and 5 multi mapped reads is the one that got an abundance estimation > 0.0. But adding a parameter to ignore genome lengths of a particular size would be great. Thanks again. |
Hi,
I recently created a custom classifier index and the abundance results aren't making sense to me. What would cause the abundance calculation to be so high (0.984) when the numReads is only 99 and the numUniqueReads is 0? Total reads for this sample was 314521.
The text was updated successfully, but these errors were encountered: