Much faster versions of PCA + UMAP exist. Can we implement them? #52

Hellisotherpeople · 2024-11-24T17:17:23Z

Intel-sklearn, CuML, and several other libraries should have optimized variants of PCA and likely a few other algorithms used. I can submit a PR implementing a few of these if you'd like. For certain types of vectors, this can cause a noticeable speedup.

GPU implementations might harm reproducibility. Might be other issues too that I haven't thought about. Thoughts?

vgel · 2024-12-14T06:32:31Z

I'd be interested in a PR--please implement them as a new method like the existing umap for now if you do. I'd be especially interested in a speed comparison!

thiswillbeyourgithub · 2024-12-14T08:42:18Z

GPU implementations might harm reproducibility

so does setting n_jobs>=1 so i'd say there's definitely a tradeoff for performance vs repro

thiswillbeyourgithub · 2024-12-14T08:46:18Z

Btw this exists:

from sklearnex import patch_sklearn
patch_sklearn(global_patch=True)
import sklearn

Source: https://uxlfoundation.github.io/scikit-learn-intelex/latest/global-patching.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Much faster versions of PCA + UMAP exist. Can we implement them? #52

Much faster versions of PCA + UMAP exist. Can we implement them? #52

Hellisotherpeople commented Nov 24, 2024

vgel commented Dec 14, 2024

thiswillbeyourgithub commented Dec 14, 2024

thiswillbeyourgithub commented Dec 14, 2024

Much faster versions of PCA + UMAP exist. Can we implement them? #52

Much faster versions of PCA + UMAP exist. Can we implement them? #52

Comments

Hellisotherpeople commented Nov 24, 2024

vgel commented Dec 14, 2024

thiswillbeyourgithub commented Dec 14, 2024

thiswillbeyourgithub commented Dec 14, 2024