Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PCA using the augmented correlation matrix #13

Open
rmflight opened this issue Jan 27, 2017 · 5 comments
Open

PCA using the augmented correlation matrix #13

rmflight opened this issue Jan 27, 2017 · 5 comments
Assignees

Comments

@rmflight
Copy link
Member

It would be really cool to be able to do a PCA decomposition on the augmented or weighted correlation matrix generated by pairwise_correlations, so that the PCA actually reflects the augmented correlation directly.

There may be a way to do this via eigen and then generating the scores, keeping in mind that PCA on the correlation is already scaled and centered.

Note that I think we would have to set the diagonal to 1 for this to work properly.

Thoughts @hunter-moseley ??

@rmflight rmflight self-assigned this Jan 27, 2017
@rmflight
Copy link
Member Author

This could be tested by generating a correlation matrix for data with non-missing values, and verifying that the centered / scaled PCA results match those from the correlation matrix.

@rmflight
Copy link
Member Author

@hunter-moseley
Copy link
Member

I think of this from the stand-point of embedding from a distance matrix. The correlation can be viewed as a normalized distance matrix and this is used to embed the rows/columns into an Euclidean space. Starting to understand the link you sent where the covariance matrix or correlation matrix shows dependency between variables which can be used to collapse the number of variables into principal components by calculating significant eigenvectors with large eigenvalues.

@hunter-moseley
Copy link
Member

Just realized that the correlation matrix needs to be between the features and not the samples. If the current PCA we are using is not dropping zeros, then this approach is going to dramatically change the PCA results, since the correlation will be limited to features the co-occur and not over-weighted by the zeros.

@rmflight
Copy link
Member Author

rmflight commented Jan 28, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants