-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compare methylation at a position between different genotype groups #318
Comments
The core question here is what biological question are you trying answer? Can you compute percent methylation as The standard definition for percent methylation as |
Thank you for your answer. What I'm trying to answer is that, I have a set of individuals grouped into a drug response phenotype based on their genetic background. E.g. Normal Metabolisers. However, within the group there is still drug response variability, perhaps not captured by genetic mutations. I'm trying to understand if they have notable methylation changes within the promoter and/or CpG island in the gene primarily responsible for that drug metabolism. I have noticed certain people have mutations that are not functionally relevant in determining the drug response phenotype but they disrupt CpG sites (for instance C>T) or create new ones. Some have these mutations in one allele, some have in both the alleles and some do not have such mutations. I'm trying to understand if these cause notable methylation changes that might contribute to changes we see within this group. In this I'm specifically looking at 5mC methylation. If I consider What I'm trying so do is compare the level of methylation at each CpG site and then across the entire promoter/ CpG island among these individuals to understand whether these mutations cause any notable changes in methylation. I selected two individuals of interest and did DMR pair for region. There was notable difference with a score 135 across the promoter. But comparing the CpG sites, difference between CpG sites of interest have a MAP-based p-value > 0.01 due to what I have explained earlier. Your inputs are greatly appreciated. Thank you. |
Hello @sumudu-rangika, That's an interesting problem, thanks for explaining it. Out of curiosity, what is the effect size you're seeing in the promoter regions? Where does the score of the promoters fall in the distribution of scores? Basically, I'm wondering if you can make a claim about the change in the methylation in the region. If I'm understanding you correctly, the single-site MAP-based p-values are low due to one sample (the one carrying the mutation) having half of the observations of the reference-allele sample. If you assume for a second that the methylation rate of CpGs in close proximity will likely have a similar distribution. Suppose this formalism: Where Another approach is to use the segmenting HMM over the region. The transition probability between sites being "Same" vs "Different" is exactly intended to model the proximity relationship (granted only as a first order Markov process). Maybe try |
Thanks a lot for the explanation. I've attached here my results from 2 samples of interest as an example. sample_a has 3 heterozygous SNPs in the promoter region that disrupts 3 CpG sites (3 SNPs in the same allele). sample_b has no such mutations. The CpG sites I'm looking at are I ran with --segment and the results I also attached CpG sites without --segment here And with --region here Thank you ! |
Hi Arthur,
I want to compare 5mC methylation at a position between 3 genotypes groups within a cohort.
For ex. a C>T SNP which disrupt a reference CpG site, I want to see methylation differences at this site between CC (homo reference) and CT (heterozygotes) within the cohort. In the case of TTs methylation at both haplotypes are zero.
I see some CC individuals have ~80% methylation at this site. Also some CTs have ~80% at this site while only one haplotype being methylated and other has no methylation (I have attached a part of cohort for one such CpG site). Because, in case of CT it considers only filter passed modified and canonical base. Ndiff reads are high in this case with the C>T mutation but not counted in arriving at this methylation %. It's like methylation is over estimated here compared to CC where in CTs only half of the reads are methylated and other half having the SNP.
I am confused about how to approach this. Should I recalculate the methylation % at each position as
(Nmod / Nmod + Ncanonical + Ndiff + Nnocall), for all the individuals, instead of (Nmod / Nmod + Ncanonical) so that all reads are captured in calculating the methylation % for comparison in my case?
I came across a comment in modbam2bed by cjw85 which explains a similar case. Can I use this approach in modkit as well?
[https://github.com/nanoporetech/megalodon/issues/247]
Appreciate your thoughts on this .
Thank you in advance.
Best,
Sumudu
The text was updated successfully, but these errors were encountered: