Skip to content

Commit

Permalink
WIP: Allelic R-squared
Browse files Browse the repository at this point in the history
  • Loading branch information
szhan committed Feb 25, 2024
1 parent 9b66c9b commit 1b974b8
Showing 1 changed file with 22 additions and 5 deletions.
27 changes: 22 additions & 5 deletions python/tests/beagle_numba.py
Original file line number Diff line number Diff line change
Expand Up @@ -591,6 +591,7 @@ def compute_estimated_allelic_r_squared(
Compute the estimated allelic R^2 for a given imputed site.
It is not the true allelic R^2, which needs access to true genotypes to compute.
The true allelic R^s is the squared correlation between true and imputed ALT dosages.
It has shown that the true and estimated allelic R-squared are highly correlated.
In BEAGLE 4.1, it is AR2: "Allelic R-Squared: estimated squared correlation
Expand All @@ -599,18 +600,34 @@ def compute_estimated_allelic_r_squared(
See formulation in the Appendix 1 of Browning and Browning. (2009).
Am J Hum Genet. 84(2): 210–223. doi: 10.1016/j.ajhg.2009.01.005.
Assume that all sites are biallelic. Otherwise, the calculation below is incorrect.
Note that 0 refers to the major allele and 1 the minor allele.
:param numpy.ndarray alleles_1: Imputed alleles for haplotype 1.
:param numpy.ndarray allele_probs_1: Imputed allele probabilities for haplotype 1.
:param numpy.ndarray alleles_2: Imputed alleles for haplotype 2.
:param numpy.ndarray allele_probs_2: Imputed allele probabilities for haplotype 2.
:return: Estimated allelic R-squared.
:rtype: float
"""
assert len(alleles_1) == len(alleles_2), "Lengths of imputed alleles differ."
assert len(allele_probs_1) == len(
allele_probs_2
), "Lengths of allele probabilities differ."
est_allelic_rsq = 0
n = len(alleles_1)
assert len(alleles_2) == n, "Lengths of alleles differ."
assert (
len(allele_probs_1) == n
), "Lengths of alleles and allele probabilities differ."
assert (
len(allele_probs_2) == n
), "Lengths of alleles and allele probabilities differ."
# Genotype probabilities for ith sample out of n samples.
# Dossges: P(RR) = 0, P(RA or AR) = 1, P(AA) = 2.
y = np.zeros((n, 3), dtype=np.float64)
u = y[:, 1] + 2 * y[:, 2] # E[X | y_i].
z = np.argmax(y, axis=1) # Most likely imputed genotype of ith sample.
w = y[:, 1] + 4 * y[:, 2] # E[X^2 | y_i].
term_1 = (np.sum(z * u) - (1 / n) * (np.sum(u) * np.sum(z))) ** 2
term_2 = np.sum(w) - (1 / n) * np.sum(u) ** 2
term_3 = np.sum(z) - (1 / n) * np.sum(z) ** 2
est_allelic_rsq = term_1 / (term_2 * term_3)
return est_allelic_rsq


Expand Down

0 comments on commit 1b974b8

Please sign in to comment.