Adjusting genomescope parameters for long reads #180

bioannap · 2024-12-05T14:36:58Z

Dear developers,

We are trying to use genomescope and smudgeplot for inferring the ploidy on a non-model plant. We expect it to be polyploid but we don't have any proof about it.

We have long read data generated form Pacbio Revio.

For creating the genomescope file .histo we tried to use the recommended parameters:

$ kmc -k21 -t10 -m64 -ci1 -cs10000 myrawreads.fastq reads tmp/
$ kmc_tools transform reads histogram reads.histo -cx10000

and for visualizing we used the online platform http://genomescope.org/ setting k-mer lenth = 21, Ploidy = 2 (but only because we don't know the ploidy), Max k-mer coverage = -1, Averge k-mer coverage =-1

And this is the result:

When trying to use ploidy = 4 instead the result would be this:

The non-log scale doesn't seem to have any peak, and we don't understand how to interpret the log scale. Also the model fit is about 0%. Would you suggest to use different parameters for long reads?

Thank you very much in advance!

KamilSJaron · 2025-02-15T20:21:42Z

Hi @bioannap, so sorry for very slow response, this issue somehow slipped through the cracks (I am usually good with not marking issues as read if I don't respond).

Long reads are totally fine, what long reads we are talking about? HiFi or duplex or corrected nanopore are quite alright, but older long reads can be a bit messy. Nevertheless, your dataset... looks a bit funny, I don't understand why the non-log version is so ... error dominated. I would load it the spectrum in R and replot it manually to see how it looks on a non-log scale when the y axis is sanely scaled (you can exclude the first 40x coverge, your genome has 1n coverage 80x anyway, so you won't exclude any of the genomic k-mers). Alternativelly, you can fabricate it in your histogram file and reupload it to the webserver, I am sure it i will show more... reasonable.

Also, did you use http://genomescope.org/genomescope2/? I presume so, given you talk about trying higher ploidy. How does the transformed plot looks like, I imagine that one makes more sense, no?

bioannap · 2025-02-17T15:39:56Z

Hi @KamilSJaron thank you for your answer!

Don't worry, I actually had the time to practice a little bit more and try out FastK for kmer counting.
We have Pacbio HiFi reads so I guess they are fine for Genomescope2.0
The reason for that strange looking plot, as you suggested, could have been scaling which is not automatic using the webpage for visualizing the histo plot. I generated using the command-line interface of Genomescope2.0 and the linear plot looks much better!
Another reason could be the tool used for kmer counting but that would be unusual.

Here's the newly generated plot, it looks much better.

Based on this I would say it's a diploid, but I will also run smudgeplot to be sure.

Thanks again for your help!
Anna

KamilSJaron added the genomescope included label Feb 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adjusting genomescope parameters for long reads #180

Adjusting genomescope parameters for long reads #180

bioannap commented Dec 5, 2024

KamilSJaron commented Feb 15, 2025

bioannap commented Feb 17, 2025

Adjusting genomescope parameters for long reads #180

Adjusting genomescope parameters for long reads #180

Comments

bioannap commented Dec 5, 2024

KamilSJaron commented Feb 15, 2025

bioannap commented Feb 17, 2025