modkit DMR result not as expected #316

baibhav-bioinfo · 2024-12-13T23:08:23Z

Hello,
Thankyou for making such a versatile software.
I am working with Nanopore DRS dataset of plant.

I ran dorado and now working with modkit for downstream analysis.

I was able to run the DMR for my 2 conditions with 3 replicates each.

(1) But as i can see, the table contains no such row (site) which have 0 modification in one condition and something in other. The table have rows (sites) which have at least 1 modification in one of the condition.

I mean there should be some sites which is only modified in one of the two condition and those should be interesting to study.

(2) Following is one row of the result
chromosome_1 13914 13915 . -0.3043135065740614 a:25 104 a:16 72 a:24.04 a:22.22 0.24038461 0.22222222 1 0.01816239316239318 1 0.03205128205128205 66 66 0.33730454254826525,0.22492942694127707 -0.10303030303030303,0.14160839160839161

whats the test used in the comparison analysis, and is the "MAP-based p-value" and "balanced MAP-based p-value" corresponds to pvalue and adjusted pvalue which we generally see and take cutoffs as 0.05 for both?

(3) Also, we can see there is not any strand information column in the DMR results table. But the input for calculating the DMR which are the bed files do have the strand information.

so, is there any way we can extract the strand information too for the DMR sites.

ArtRand · 2024-12-14T01:11:08Z

Hello @baibhav-bioinfo,

(1) But as i can see, the table contains no such row (site) which have 0 modification in one condition and something in other. The table have rows (sites) which have at least 1 modification in one of the condition.

This is a property of your data and the accuracy of the m6A model. All models are imperfect to some degree and will have a false positive rate (FPR). At the coverage you have here (104 and 72, combined) it's unlikely that you'll see zero predicted m6A sites. We've observed FPR values of 0.7% and 2% for the DRACH and m6A all-context models, respectively, in a 2-way m6A/A classification task. Given that, the row you've provided has ~20% m6A in both samples - considerably higher than the expected FPR. You may consider using a very high pass threshold for m6A detection (others have been using values as high as 0.99)

whats the test used in the comparison analysis, and is the "MAP-based p-value" and "balanced MAP-based p-value" corresponds to pvalue and adjusted pvalue which we generally see and take cutoffs as 0.05 for both?

The "MAP-based p-value" is used as well as the "balanced MAP-based p-value" since you have an equivalent number of replicas, the documentation explains how and when you should see these columns. 0.05 is an OK threshold, the default in the segmenter is 0.01.

(3) Also, we can see there is not any strand information column in the DMR results table. But the input for calculating the DMR which are the bed files do have the strand information.

Try the build I've attached to this thread. The strand will be after the "score" column.
modkit_u16_x86_64.tar.gz

baibhav-bioinfo · 2024-12-16T15:20:20Z

hello,
I was able to run the Dorado basecall + methylation call with my DRS reads. I ran the command both in m6A_DRACH context and m6A_all context, using the following commands

dorado basecaller [email protected] pod5/ --modified-bases-models [email protected]_m6A_DRACH@v1/ --device cuda:all > sample.bam

dorado basecaller [email protected] pod5/ --modified-bases-models [email protected]_m6A@v1/ --device cuda:all > sample.bam

for the DRACH motif context i got 60,000 sites per sample (10 million reads with ~1000nt per read)
but for all context run, i got >1.6 million m6A sites predicted.

Did something wrong happened in my ALL context run?
Is that number of sites possible?

Please suggest.

baibhav-bioinfo · 2024-12-16T15:25:09Z

This is a property of your data and the accuracy of the m6A model. All models are imperfect to some degree and will have a false positive rate (FPR). At the coverage you have here (104 and 72, combined) it's unlikely that you'll see zero predicted m6A sites. We've observed FPR values of 0.7% and 2% for the DRACH and m6A all-context models, respectively, in a 2-way m6A/A classification task. Given that, the row you've provided has ~20% m6A in both samples - considerably higher than the expected FPR. You may consider using a very high pass threshold for m6A detection (others have been using values as high as 0.99)

Thankyou so much for the detailed response, but the main problem here i think is Actually due to the way the master table is prepared for the DMR analysis. The table includes the only sites which have been found to be modified in atleast one of the condition. So the sites which are in one and not in other are left behind.
is there any way those can be recovered?

ArtRand added the question Looking for clarification on inputs and/or outputs label Dec 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

modkit DMR result not as expected #316

modkit DMR result not as expected #316

baibhav-bioinfo commented Dec 13, 2024

ArtRand commented Dec 14, 2024

baibhav-bioinfo commented Dec 16, 2024

baibhav-bioinfo commented Dec 16, 2024 •

edited

Loading

modkit DMR result not as expected #316

modkit DMR result not as expected #316

Comments

baibhav-bioinfo commented Dec 13, 2024

ArtRand commented Dec 14, 2024

baibhav-bioinfo commented Dec 16, 2024

baibhav-bioinfo commented Dec 16, 2024 • edited Loading

baibhav-bioinfo commented Dec 16, 2024 •

edited

Loading