Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cuda zero copy #13002

Closed
wants to merge 28 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
c6cd611
update mne/cuda to call cupy.asarray when possible
scottrbrtsn Dec 3, 2024
02edb9c
update tests as poc, they pass
scottrbrtsn Dec 3, 2024
5662420
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 3, 2024
f7a04ad
Update mne/cuda.py
larsoner Dec 3, 2024
fd0463e
string formatting
scottrbrtsn Dec 3, 2024
a03014e
add changelog file
scottrbrtsn Dec 3, 2024
44b8b9a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 3, 2024
96f2a03
only get shared mem if cuda capable
scottrbrtsn Dec 4, 2024
caec586
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 4, 2024
de298fe
revert numba_capable and use the bool in fixes
scottrbrtsn Dec 4, 2024
ce2ff37
gate get_shared_mem with fixes/has_numba bool
scottrbrtsn Dec 4, 2024
75c4dab
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 4, 2024
929fcb9
Update mne/cuda.py to soft import numba
scottrbrtsn Dec 4, 2024
214edf1
revert tests, make shared_mem fun private and simplified, call approp…
scottrbrtsn Dec 4, 2024
cfd6fb3
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 4, 2024
cb8fec5
soft import numba
scottrbrtsn Dec 4, 2024
6973b54
remove extra line
scottrbrtsn Dec 4, 2024
f2950e0
revert soft import.
scottrbrtsn Dec 4, 2024
ac5b5c8
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 4, 2024
e76b5a7
W needs "cuda" passed in since it just set n_jobs to 1
scottrbrtsn Dec 4, 2024
1fc59ba
hfft needs cuda for the same reason
scottrbrtsn Dec 4, 2024
9350498
remove n_jobs param
scottrbrtsn Dec 4, 2024
1a1c8ec
remove n_jobs gate
scottrbrtsn Dec 4, 2024
a3342fa
fix docstring param
scottrbrtsn Dec 5, 2024
88ae8cb
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 5, 2024
4bbd2c7
irfft, takes cuda mem, and so x is a different type. i do not think g…
scottrbrtsn Dec 6, 2024
af3cab2
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 6, 2024
74a7794
remove troubleshooting woes
scottrbrtsn Dec 6, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/changes/devel/13002.other.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Short description of the changes, by :newcontrib:`Scott Robertson`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just adding a comment so we don't forget to actually update this 🙂

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and adding your name to doc/changes/names.inc will fix the CircleCI error:

[towncrier-fragments]:89: ERROR: Indirect hyperlink target "new contributor Scott Robertson"  refers to target "scott robertson", which does not exist. [docutils]

48 changes: 44 additions & 4 deletions mne/cuda.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
# License: BSD-3-Clause
# Copyright the MNE-Python contributors.


import numpy as np
from scipy.fft import irfft, rfft

Expand All @@ -19,6 +20,40 @@
_cuda_capable = False


def _share_cuda_mem(x):
"""Get shared memory space to avoid copying from cpu to gpu when possible.

Allocate a mapped ndarray with a buffer that is pinned and mapped on
to the device. Similar to np.empty()

Requires
--------
numba


Parameters
----------
x : 1-d array

Returns
-------
a mapped array: np.ndarray
An array to be passed into cupy.asarray, which does not copy if
shared memory is already allocated. If cuda and numba are not
available, return the original array.
"""
from mne.fixes import has_numba

if _cuda_capable and has_numba:
from numba import cuda

out = cuda.mapped_array(x.shape)
out[:] = x
else:
out = x
return out


def get_cuda_memory(kind="available"):
"""Get the amount of free memory for CUDA operations.

Expand Down Expand Up @@ -176,12 +211,13 @@ def _setup_cuda_fft_multiply_repeated(n_jobs, h, n_fft, kind="FFT FIR filtering"

try:
# do the IFFT normalization now so we don't have to later
h_fft = cupy.array(cuda_dict["h_fft"])
h_fft = cupy.asarray(_share_cuda_mem(cuda_dict["h_fft"]))
logger.info(f"Using CUDA for {kind}")
except Exception as exp:
logger.info(
"CUDA not used, could not instantiate memory (arrays may be too "
f'large: "{exp}"), falling back to n_jobs=None'
f", {_explain_exception()}"
)
cuda_dict.update(h_fft=h_fft, rfft=_cuda_upload_rfft, irfft=_cuda_irfft_get)
else:
Expand Down Expand Up @@ -275,14 +311,16 @@ def _setup_cuda_fft_resample(n_jobs, W, new_len):
try:
import cupy

W = _share_cuda_mem(W)

# do the IFFT normalization now so we don't have to later
W = cupy.array(W)
W = cupy.asarray(W)
logger.info("Using CUDA for FFT resampling")
except Exception:
logger.info(
"CUDA not used, could not instantiate memory "
"(arrays may be too large), falling back to "
"n_jobs=None"
f"n_jobs=None, {_explain_exception()}"
)
else:
cuda_dict.update(
Expand All @@ -301,7 +339,9 @@ def _cuda_upload_rfft(x, n, axis=-1):
"""Upload and compute rfft."""
import cupy

return cupy.fft.rfft(cupy.array(x), n=n, axis=axis)
x = _share_cuda_mem(x)

return cupy.fft.rfft(cupy.asarray(x), n=n, axis=axis)
scottrbrtsn marked this conversation as resolved.
Show resolved Hide resolved


def _cuda_irfft_get(x, n, axis=-1):
Expand Down