-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What does it say about our data that the m6a RNA modification rates seem somewhat low? #307
Comments
Also exploring the probability distributions with |
Hello @aleighbrown, Could you let me know which m6A model you're using (which version)? If you used the automatic model selection, I need the dorado version. Thanks for this analysis, looking for the proportion of 6mA calls within DRACH motifs is a nice sanity check (I'm assuming you ran the "all-context" m6A model). As far as the motif searching, could you run Let me take a look at the output probabilities on some data I have and circle back. My suspicion is that the sample has low m6A and the ECDF is essentially dominated by low confidence false positive calls. |
Yes we ran the "all-context" m6A model on these samples. We ran this actually in 2 sets, one sample (Knockdown 1 in this graph) You can see from the above graph that even at the higher m6a probabilities there's a slight reduction in high probability m6a calls for the sample run on the earlier dorodo version so we're actually planning to recall this data anyhow with dorodo v 0.8.3 with the dorado_model [email protected],m6A_DRACH basecalling models to maintain consistency across the samples. We have some orthogonal reasons to suspect that our knockdown should reduce overall levels of RNA m6a - would this affect the whole probability estimation in someway that I'm not grokking? I'll report back with the results on the newer model run. |
So
What I was hoping to accomplish with Also do these rates seem...reasonable for a sample? It's hard for me to find information published on direct-RNA seq which is comparable |
Just a question about motif searching and data quality.
Our data is called using dorodo and we called the bedmethyls with modkit pileup
We tried using the defaults but consistently got output like the following in the log file
So we've set the A and a thresholds as follows:
Which seemed to give decent results, e.g. when I manually checked how many modified sites were called inside DRACH motifs using the higher thresholds this pattern/fraction seemed to make sense:
Versus the same result running the pileup using the default filtering thresholds, while many more sites reported - much lower proportion of those are inside canonical DRACH motifs.
However the issues appear when we start trying to use the motif search function
Running
motif search
using the following parametersThe motif search runs for hours and produces results like the following:
My questions are thus:
1: Are the stringency settings on modkit pileup perhaps too high, resulting in fewer sites being called?
2: Does the fact that we had to set the settings so high say something about the raw data quality that I might be missing?
I also realize that if I want the motif search to finish I can play around more with the
--low-thresh
--high-thresh
parameters, but I'm not sure if that's something which makes sense here, or if instead I should take the lowish m6a rate as a sign of something a miss with an earlier step in data processing.We're the first lab in the department to do this kind of direct RNA sequencing + the first one to try it with the sequencing facility so I'm just a bit confused as to what these results might mean re: quality of our data.
Thank you for the help!
The text was updated successfully, but these errors were encountered: