Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compute covariance matrix and PCA #275

Open
petrelharp opened this issue Jul 30, 2019 · 3 comments
Open

compute covariance matrix and PCA #275

petrelharp opened this issue Jul 30, 2019 · 3 comments
Labels
enhancement New feature or request statistics

Comments

@petrelharp
Copy link
Contributor

To compute a covariance matrix, or do PCA, we currently have to export the genotype matrix. It would be nice to do this using the statistics framework, and totally do-able. It is also possible to find PCs without computing the covariance matrix first. This would be a nice compact but very useful contribution for someone to take on.

@petrelharp petrelharp added the enhancement New feature or request label Jul 30, 2019
@daniel-trejobanos
Copy link

I am curious about this, specially how to compute the PCA without the covariance matrix? do you have a paper explaining this?

@petrelharp
Copy link
Contributor Author

I have not worked out the details, but the basic idea is that the PCs are the eigenvectors of the genetic covariance matrix, and modern iterative methods (like Krylov methods) exist to find the eigenvectors of a matrix A without ever computing A explicitly, but rather finding the result of multiplying A by some random vectors. This paper: https://www.ncbi.nlm.nih.gov/pubmed/26924531 does something like this. In our situation, A = G^T G, where G is the genotype matrix (possibly normalized), and so we can use our general statistics to quickly compute u^T A v for vectors u and v.

That's the general idea. I have not worked out the details, so it's possible there's something very tricky in there, but I'm happy to help work it through.

@hyanwong
Copy link
Member

hyanwong commented Jul 5, 2023

Also see performance issues in #1743

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request statistics
Projects
None yet
Development

No branches or pull requests

3 participants