Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Much faster versions of PCA + UMAP exist. Can we implement them? #52

Open
Hellisotherpeople opened this issue Nov 24, 2024 · 3 comments

Comments

@Hellisotherpeople
Copy link

Intel-sklearn, CuML, and several other libraries should have optimized variants of PCA and likely a few other algorithms used. I can submit a PR implementing a few of these if you'd like. For certain types of vectors, this can cause a noticeable speedup.

GPU implementations might harm reproducibility. Might be other issues too that I haven't thought about. Thoughts?

@vgel
Copy link
Owner

vgel commented Dec 14, 2024

I'd be interested in a PR--please implement them as a new method like the existing umap for now if you do. I'd be especially interested in a speed comparison!

@thiswillbeyourgithub
Copy link

GPU implementations might harm reproducibility

so does setting n_jobs>=1 so i'd say there's definitely a tradeoff for performance vs repro

@thiswillbeyourgithub
Copy link

Btw this exists:

from sklearnex import patch_sklearn
patch_sklearn(global_patch=True)
import sklearn

Source: https://uxlfoundation.github.io/scikit-learn-intelex/latest/global-patching.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants