-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
modkit DMR result not as expected #316
Comments
Hello @baibhav-bioinfo,
This is a property of your data and the accuracy of the m6A model. All models are imperfect to some degree and will have a false positive rate (FPR). At the coverage you have here (104 and 72, combined) it's unlikely that you'll see zero predicted m6A sites. We've observed FPR values of 0.7% and 2% for the DRACH and m6A all-context models, respectively, in a 2-way m6A/A classification task. Given that, the row you've provided has ~20% m6A in both samples - considerably higher than the expected FPR. You may consider using a very high pass threshold for m6A detection (others have been using values as high as 0.99)
The "MAP-based p-value" is used as well as the "balanced MAP-based p-value" since you have an equivalent number of replicas, the documentation explains how and when you should see these columns. 0.05 is an OK threshold, the default in the segmenter is 0.01.
Try the build I've attached to this thread. The strand will be after the "score" column. |
hello, dorado basecaller [email protected] pod5/ --modified-bases-models [email protected]_m6A_DRACH@v1/ --device cuda:all > sample.bam dorado basecaller [email protected] pod5/ --modified-bases-models [email protected]_m6A@v1/ --device cuda:all > sample.bam for the DRACH motif context i got 60,000 sites per sample (10 million reads with ~1000nt per read) Did something wrong happened in my ALL context run? Please suggest. |
Thankyou so much for the detailed response, but the main problem here i think is Actually due to the way the master table is prepared for the DMR analysis. The table includes the only sites which have been found to be modified in atleast one of the condition. So the sites which are in one and not in other are left behind. |
Hello,
Thankyou for making such a versatile software.
I am working with Nanopore DRS dataset of plant.
I ran dorado and now working with modkit for downstream analysis.
I was able to run the DMR for my 2 conditions with 3 replicates each.
(1) But as i can see, the table contains no such row (site) which have 0 modification in one condition and something in other. The table have rows (sites) which have at least 1 modification in one of the condition.
I mean there should be some sites which is only modified in one of the two condition and those should be interesting to study.
(2) Following is one row of the result
chromosome_1 13914 13915 . -0.3043135065740614 a:25 104 a:16 72 a:24.04 a:22.22 0.24038461 0.22222222 1 0.01816239316239318 1 0.03205128205128205 66 66 0.33730454254826525,0.22492942694127707 -0.10303030303030303,0.14160839160839161
whats the test used in the comparison analysis, and is the "MAP-based p-value" and "balanced MAP-based p-value" corresponds to pvalue and adjusted pvalue which we generally see and take cutoffs as 0.05 for both?
(3) Also, we can see there is not any strand information column in the DMR results table. But the input for calculating the DMR which are the bed files do have the strand information.
so, is there any way we can extract the strand information too for the DMR sites.
The text was updated successfully, but these errors were encountered: