Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve LD shrinkage efficiency #35

Merged
merged 2 commits into from
Aug 21, 2024
Merged

Improve LD shrinkage efficiency #35

merged 2 commits into from
Aug 21, 2024

Conversation

biona001
Copy link
Owner

@biona001 biona001 commented Aug 2, 2024

Motivation

When a LD matrix is too large (e.g. >5000 variables), the function find_optimal_shrinkage becomes too slow.

Proposal

Rather than computing shrinkage on the entire correlation matrix, we can do so on just the group representative variables, since these variables are primary drivers for the dependencies across groups. This often reduce the LD matrix size by a factor of 10.

Potential issue

Often there are few representatives in a block (e.g. < 10). This can lead to large shrinkage values (e.g. >0.5) even though the LD matrix matches the Z-score exactly.

this PR

Now we compute shrinkage on the original matrix if the size is manageable (<1000 variables), otherwise we change to computing shrinkage value on the representatives.

Before merging

To check this approach is reasonable, we should re-run the Alzheimer's disease study to see if the shrinkage value is comparable to previously (which I think we around 0.01 to 0.02).

@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 62.96296% with 10 lines in your changes missing coverage. Please review.

Project coverage is 64.73%. Comparing base (6d00378) to head (1390806).

Files Patch % Lines
src/ghostbasil_parallel.jl 61.53% 10 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #35      +/-   ##
==========================================
- Coverage   65.09%   64.73%   -0.37%     
==========================================
  Files           5        5              
  Lines         679      689      +10     
==========================================
+ Hits          442      446       +4     
- Misses        237      243       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@biona001
Copy link
Owner Author

biona001 commented Aug 21, 2024

I tested this on the Alzheimer's disease study and the mean shrinkage value is 0.01582485377913878 after this PR (before it was 0.015295867015486209). This difference is small because only 5/1703 regions actually trigger the new code. Even so, 4/5 regions returned similar answers, so I'll go ahead and merge anyways

Before this PR:

region 693 / 1703 (f = LD_start29737971_end30798167.h5): chr 6, nz beta = 19, nsnps = 1454, shrinkage = 0.0529
region 694 / 1703 (f = LD_start30798168_end31571217.h5): chr 6, nz beta = 15, nsnps = 1991, shrinkage = 0.0345
region 695 / 1703 (f = LD_start31571218_end32682663.h5): chr 6, nz beta = 20, nsnps = 1732, shrinkage = 0.1506
region 696 / 1703 (f = LD_start32682664_end33236496.h5): chr 6, nz beta = 16, nsnps = 1028, shrinkage = 0.0294
region 1521 / 1703 (f = LD_start69387817_end72672202.h5): chr 17, nz beta = 14, nsnps = 1278, shrinkage = 0.004

After this PR:

region 693 / 1703 (f = LD_start29737971_end30798167.h5): chr 6, nz beta = 26, nsnps = 1454, shrinkage = 0.0849
region 694 / 1703 (f = LD_start30798168_end31571217.h5): chr 6, nz beta = 17, nsnps = 1991, shrinkage = 0.0927
region 695 / 1703 (f = LD_start31571218_end32682663.h5): chr 6, nz beta = 18, nsnps = 1732, shrinkage = 0.1565
region 696 / 1703 (f = LD_start32682664_end33236496.h5): chr 6, nz beta = 13, nsnps = 1028, shrinkage = 0.8383
region 1521 / 1703 (f = LD_start69387817_end72672202.h5): chr 17, nz beta = 8, nsnps = 1278, shrinkage = 0.0

@biona001 biona001 merged commit 634b8a0 into main Aug 21, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants