-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ts-PCA performance is slow compared scikit-allel #1743
Comments
This may be related to #647, since as written the code is maintaining a (num nodes) x (num samples) matrix as it iterates over the trees. |
I just tried this out but realised we need to get #1246 merged |
Ah apologies, I should have said - I was running this off the branch in #1246! |
No worries @brieuclehmann! I just did some profiling, and the majority of the time is spent in What are out options here for doing less work per node @petrelharp? Would it worth trying to cast this as a function of IBD segments, so we can see if that approach is at least potentially faster? |
Building on #898 and using the 'matrix multiplication' in WIP #1246 (i.e. genetic_relatedness_weighted), we're trying to implement PCA for tskit. This appears to be working 🎉 but is rather slow compared to scikit-allel. See the following code for a small reprex, where scikit-allel is approximately 20 times faster than our current tskit implementation.
The text was updated successfully, but these errors were encountered: