-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
half genome length from tutorial-strawberry #84
Comments
The thing with genomescope is that it's all based on guessing right the 1n coverage. The model you posted got it wrong, instead of 146x it estimated twice as much. As a consequence, the real diploid peak is the haploid peak in the model, and real tetraploid peak is consideed the diploid one. All that leads to an unrealistically high heterozygosity estimate (for a strawberry) and about half of the expected genome size. The genome size usually means the haploid genome size (i.e. counting each chromosome only once) - this is the value reported by genomescope and also the value you find in genome browsers etc. Flow cytometry and polyploidy folks sometimes also talk about the total genomic content in a cell, and that is ploidy * haploid genome size (what you measure using fc). So, the answer is the model you show here is wrong and I am nor sure why. It could be just a freak convergence, but I should check if i can reconstruct the tutorial with the latest version of genomescope. Would you mind zipping the kmer histogram and posting it here? I don't think I currently have the data by my hand :-) |
Thank you very much for the quick response. I see the kcov 287 is consistent with the coverage in the x-axis in genomescope. Is the 1n coverage inferred from the highest peak in the genomescope plot? I used genomescope v2.0. |
Indeed, the older versions of genomescope certainly converged on the expected coverage (~140x) right away, but the latest version really does not - the default run estimates 1n = 280. If you specify the coverage prior (-l 140) the model converges as expected (pic attached), but that's not completely satisfying. I am out of my depth @tbenavi1 ... I recalled this tweet, and tried reducing number of rounds - https://twitter.com/t_rhyker/status/1288863398374014979?s=20, but that changes nothing (also the situation is quite different, here there is soooo much coverage) |
This seems a silly question, but I really confused with why I get a half estimated compared to the tutorial with the same commands.
So I have followed https://github.com/KamilSJaron/smudgeplot/wiki/tutorial-strawberry to get started with smudgeplot and genomescope. The commands I used:
Actually, smudgeplot gave the same results as shown in the tutorial and estimated as tetraploid. However, genomescope showed 100Mbp in length with -p 2 and the heterozygosity rate is much higher (13.8%). Is the number following "len:" the estimated genome size or should multiply with p? Can you please give me insights on this? I appreciate you help.


The text was updated successfully, but these errors were encountered: