Unusual abundance results with custom classifier index #252

hirosatosd · 2023-02-01T23:45:01Z

Hi,

I recently created a custom classifier index and the abundance results aren't making sense to me. What would cause the abundance calculation to be so high (0.984) when the numReads is only 99 and the numUniqueReads is 0? Total reads for this sample was 314521.

mourisl · 2023-02-02T00:40:31Z

This is indeed very strange. Are many reads assigned to the ancestors of 52461?

hirosatosd · 2023-02-02T02:03:35Z

Yeah, at genus level numReads is 2301874 and the numUniqueReads is 2242384

mourisl · 2023-02-02T02:06:39Z

I think when computing the abundance, Centrifuge will trickle down the abundance from ancestor taxonomy to the leaves, so that's why this species gets a very high abundance.

pjtorres · 2023-02-02T21:20:45Z

Hi, I also had an issue with the abundance calculation, except in my case I had 710 unique reads out of 710 total reads (numreads) assigned to a taxa and relative abundance is 0.0 .

2710803 strain 1080 710 710 0.0

In the same run and report file I have another taxa that had 0 unique reads, but was multi mapped to 5 times and that taxa did get a relative abundance.

2042592 strain 807 5 0 3.67064e-292

Appreciate any advice or guidance. Thanks.

mourisl · 2023-02-02T21:30:56Z

@pjtorres I think this is more like a rounding error in EM algorithm. I recently also noticed that some strains may get very high abundance due to their short genome sizes. Is this your case?

I'm thinking about adding a parameter to ignore the short ones. This might be related to the issue you just opened.

pjtorres · 2023-02-02T22:20:38Z

@mourisl thank you for your quick response! I agree that it is some issue with the EM part. I also think it might have to do with genome length. However; in the specific example above you can see that both strains are about the same size 807 vs 1080 and the read with no unique reads and 5 multi mapped reads is the one that got an abundance estimation > 0.0. But adding a parameter to ignore genome lengths of a particular size would be great. Thanks again.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unusual abundance results with custom classifier index #252

Unusual abundance results with custom classifier index #252

hirosatosd commented Feb 1, 2023 •

edited

Loading

mourisl commented Feb 2, 2023

hirosatosd commented Feb 2, 2023

mourisl commented Feb 2, 2023

pjtorres commented Feb 2, 2023

mourisl commented Feb 2, 2023

pjtorres commented Feb 2, 2023

Unusual abundance results with custom classifier index #252

Unusual abundance results with custom classifier index #252

Comments

hirosatosd commented Feb 1, 2023 • edited Loading

mourisl commented Feb 2, 2023

hirosatosd commented Feb 2, 2023

mourisl commented Feb 2, 2023

pjtorres commented Feb 2, 2023

mourisl commented Feb 2, 2023

pjtorres commented Feb 2, 2023

hirosatosd commented Feb 1, 2023 •

edited

Loading