Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: make selectors amenable to GPU processing #127

Open
rosecers opened this issue Apr 8, 2022 · 6 comments
Open

Feature Request: make selectors amenable to GPU processing #127

rosecers opened this issue Apr 8, 2022 · 6 comments
Labels
enhancement New feature or request low-priority Something is not so important

Comments

@rosecers
Copy link
Collaborator

rosecers commented Apr 8, 2022

as requested by @Luthaf

@rosecers rosecers added enhancement New feature or request low-priority Something is not so important labels Apr 8, 2022
@Luthaf
Copy link
Collaborator

Luthaf commented Apr 8, 2022

So for more context on this: it would be nice to be able to pass a torch.Tensor (or jax array) living on GPU directly to the selectors, instead of having to move the data back to main CPU memory.

A first pass would be to make sure all the function calls are compatible with PyTorch API, but given the high usage of Python for loops in the selector code that might not give a lot of performance improvement. The second step would then be to rewrite the selector code to use more high-level operations & launch larger GPU kernels, and hopefully improve performance.


This is mostly unrelated to the autograd part of PyTorch, so even if we need to .detach() the tensors before passing them, that would be fine with me. I would mostly like to be able to keep the data in GPU memory.

@Luthaf
Copy link
Collaborator

Luthaf commented Apr 24, 2023

My ideal user-facing interface for this would be to be able to do something like this:

import torch
from skmatter.feature_selection import CUR

X = torch.rand(300, 300, device="cuda")  # or device="mps" on Apple M1/M2

selector = CUR(n_to_select=4)
selector.fit(X)

Xr = selector.transform(X)
# Xr is a torch tensor, with device=X.device

A first step for this would be to add a test trying to use skmatter with a torch tensor, and check where the code starts throwing errors.


Depending on the number of function call (e.g. np.sum, …) that need to be updated, it might be interesting to use https://github.com/jcmgray/autoray to dispatch function calls to the right backend.

@Luthaf
Copy link
Collaborator

Luthaf commented Jun 29, 2023

This is put on the back burner for now, if you are interested in getting skmatter to run on GPU please voice your interest here!

@Luthaf
Copy link
Collaborator

Luthaf commented Jul 13, 2023

It looks like sklearn now has experimental support for PyTorch/CuPy (and thus GPU data) using the array API: https://scikit-learn.org/stable/modules/array_api.html. We could use the same here!

@agoscinski
Copy link
Collaborator

We should experiment as well how the array api works with our selection methods. FPS is probably a good candidate because we do not use very complicated mathematical operations there. So hopefully there is not so much friction in making this work.

@Luthaf
Copy link
Collaborator

Luthaf commented Oct 18, 2023

More info on this array API in sklearn: https://labs.quansight.org/blog/array-api-support-scikit-learn.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request low-priority Something is not so important
Projects
None yet
Development

No branches or pull requests

3 participants