Make estimates of SNP linkage #1

petercombs · 2019-01-16T18:47:31Z

We don't yet have a great sense of what the linkage is in the populations we're looking at. Flowers et al 2010 implies it should be small (see: Figure 8), on the order of 10-25kb.

In this case, we cannot directly assay the SNP values. However, we can assay the SNP scores. One approach to take is similar to Figure 8B in Flowers:

Find all pairs of adjacent SNPs that are between bin_low and bin_high bases apart.
Measure the correlation of {p-values, log10 pvalues} between those adjacent SNPs.
Plot for all bin sizes.

The text was updated successfully, but these errors were encountered:

petercombs · 2019-01-17T00:40:37Z

Made a first attempt at this in the ldplot branch. It gets very jaggedy and there's fairly high correlation in some bins. See this plot with 10bp bins:

petercombs · 2019-01-26T06:17:50Z

Presumably what's going on here (though I should check) is that there are a lot of SNPs with very low coverage, and thus very low p-values. Maybe by taking correlation of log10 p-values?

petercombs · 2019-01-28T18:59:21Z

Okay, log10 pvalues does seem to help, as does taking wider bins:

Now one question is whether I should do all pairs of SNPs that are between [N,N+k) basepairs apart, or only adjacent pairs. All pairs is a little bit harder to set up, but should give more data. Is that double counting in a bad way though? I should ask around.

petercombs · 2019-01-28T20:28:55Z

Hunter agrees that all pairs is probably not necessary.
One way to get around the noisiness is to break SNPs up by groups sorted by distance, e.g., first 100 snps , second 100, etc. rather than a distance bin.
Can also do a spearman correlation within each bin, rather than worrying about log10 pvalue vs pvalue.

petercombs · 2019-02-13T23:13:00Z

Okay, making progress here in the ldplot branch. In addition to plotting each subtype separately, I should make one that has all the subtypes together.

petercombs · 2019-02-15T23:54:39Z

So the issue I'm seeing now is that there seems to be a persistent baseline level of correlation—it never really gets below about 0.25, even between 100kb and 1mb.

I talked to Sur, Mark, and Thomas, and some ideas are:

Look at the correlation of the random p-values. I need to double check exactly how I'm doing that randomization to decide whether this will do what I think it will do, but not a bad first step.
Bootstrap the standard deviation of the correlation with a jack-knife procedure by repeated leave-one-out. I'm not optimistic that this will work, since there are hundreds of SNPs at these larger distances
Look at the correlation between SNPs on different chromosomes. These are not physically linked, so it should go to zero. Except that because these are haploid organisms, there could be some population structure that's keeping the correlations high, even across chromosomes.
Look at the correlation of p-values in real GWAS or already published pooling studies.

petercombs added the enhancement New feature or request label Jan 16, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make estimates of SNP linkage #1

Make estimates of SNP linkage #1

petercombs commented Jan 16, 2019 •

edited

Loading

petercombs commented Jan 17, 2019

petercombs commented Jan 26, 2019

petercombs commented Jan 28, 2019

petercombs commented Jan 28, 2019

petercombs commented Feb 13, 2019

petercombs commented Feb 15, 2019 •

edited

Loading

Make estimates of SNP linkage #1

Make estimates of SNP linkage #1

Comments

petercombs commented Jan 16, 2019 • edited Loading

petercombs commented Jan 17, 2019

petercombs commented Jan 26, 2019

petercombs commented Jan 28, 2019

petercombs commented Jan 28, 2019

petercombs commented Feb 13, 2019

petercombs commented Feb 15, 2019 • edited Loading

petercombs commented Jan 16, 2019 •

edited

Loading

petercombs commented Feb 15, 2019 •

edited

Loading