-
Notifications
You must be signed in to change notification settings - Fork 229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compiler: Unified Memory Allocator #2023
base: master
Are you sure you want to change the base?
Conversation
It was used the version 8.3.0 of Cupy |
ToDo:
|
devito/data/data.py
Outdated
@@ -82,7 +82,8 @@ def __del__(self): | |||
# Dask/Distributed context), which may (re)create a Data object | |||
# without going through `__array_finalize__` | |||
return | |||
self._allocator.free(*self._memfree_args) | |||
if self._allocator is not CUPY_ALLOC: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, you should override free
instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in fact, free
is a no-op, so why the need to special-case here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution! I left a few comments.
There also are other issues, in particular lack of tests, update to installation, etc. You may want to take a look at this file: https://github.com/devitocodes/devito/blob/master/CONTRIBUTING.md#contributing-to-devito
devito/data/allocators.py
Outdated
@@ -15,7 +16,7 @@ | |||
|
|||
__all__ = ['ALLOC_FLAT', 'ALLOC_NUMA_LOCAL', 'ALLOC_NUMA_ANY', | |||
'ALLOC_KNL_MCDRAM', 'ALLOC_KNL_DRAM', 'ALLOC_GUARD', | |||
'default_allocator'] | |||
'CUPY_ALLOC', 'default_allocator'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ALLOC_CUPY for homogeneity
devito/data/allocators.py
Outdated
class CupyAllocator(MemoryAllocator): | ||
|
||
""" | ||
Memory allocator based on ``posix`` functions. The allocated memory is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copy-paste docstring
devito/data/allocators.py
Outdated
aligned to page boundaries. | ||
""" | ||
|
||
is_Posix = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leftover.....
devito/data/allocators.py
Outdated
mem_obj = cp.zeros(size, dtype=cp.float64) | ||
return mem_obj.data.ptr, mem_obj | ||
|
||
def free(self, c_pointer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CuPy frees the memory right? unless we explicitly tell it at some point?
devito/data/data.py
Outdated
@@ -82,7 +82,8 @@ def __del__(self): | |||
# Dask/Distributed context), which may (re)create a Data object | |||
# without going through `__array_finalize__` | |||
return | |||
self._allocator.free(*self._memfree_args) | |||
if self._allocator is not CUPY_ALLOC: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in fact, free
is a no-op, so why the need to special-case here?
devito/passes/iet/definitions.py
Outdated
@@ -435,6 +437,9 @@ def _map_function_on_high_bw_mem(self, site, obj, storage, devicerm, read_only=F | |||
""" | |||
mmap = self.lang._map_to(obj) | |||
|
|||
if obj._allocator is CUPY_ALLOC: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this breaks the abstraction in all sort of ways, unfortunately
You can't access the allocator here to steer compilation
what really matters at this point is the _mem_space
of the object: https://github.com/devitocodes/devito/blob/master/devito/types/basic.py#L39
you shouldn't actually end up here, because GPU-allocated functions shuold have a local mem-space, which in turns naturally prevents them ever enter this point
…te without openacc pragmas
…l of conditional inside 'del' method
@FabioLuporini, we added tests to this allocator, could we proceed with this PR? |
tests/test_data.py
Outdated
@@ -206,6 +207,26 @@ def test_indexing_into_sparse(self): | |||
sf.data[1:-1, 0] = np.arange(8) | |||
assert np.all(sf.data[1:-1, 0] == np.arange(8)) | |||
|
|||
def test_uma_allocation(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a test_external_allocator
somewhere in this file. Could you move that test, and this test, within a new class, say TestAllocators
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the PR with this change
The issue is that CuPy is an optional dependency (shipped by the NVidia SDK I'm guessing?), so if I try to run CI here, it will undoubtedly break. So, we need conditional imports, a mechanism like this to emit suitable error messages in case one attempts to use the allocator but the allocator isn't available, etc etc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on what I said above, a few tweaks are still necessary here
Should this be added to the GPU CI and actually tested on the nvidia run? |
We've updated the allocators.py file to use conditional import for cupy. I think this solves this problem. |
…th size = 0 (MPI)
devito/data/allocators.py
Outdated
# try: | ||
# from mpi4py import MPI # noqa | ||
# except ImportError: | ||
# MPI = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leftover
devito/data/allocators.py
Outdated
cls.lib = cp | ||
cls._initialize_shared_memory() | ||
try: | ||
from mpi4py import MPI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why reimport and not import it from devito.mpi
?
devito/data/allocators.py
Outdated
from mpi4py import MPI | ||
cls.MPI = MPI | ||
cls._set_device_for_mpi() | ||
except: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
except ImportError
other errors should be caught
devito/data/allocators.py
Outdated
def _alloc_C_libcall(self, size, ctype): | ||
if not self.available(): | ||
raise ImportError("Couldn't initialize cupy or MPI elements of alocation") | ||
mem_obj = self.lib.zeros(size, dtype=self.lib.float64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
always float64
? I thoiught size was for UInt8
ctype_1d = ctype * size | ||
buf = ctypes.cast(c_pointer, ctypes.POINTER(ctype_1d)).contents | ||
pointer = np.frombuffer(buf, dtype=dtype) | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can we end up here? the c_pointer is None
case is already above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During the execution in MPI, domain splitting can generate a situation where the allocated data size is zero, as we have observed with Sparse Functions. When this occurs, Cupy returns a pointer with a value of zero. This conditional statement was defined for this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment noting this, until some better solution is around?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will push it, George.
tests/test_data.py
Outdated
@@ -1,8 +1,10 @@ | |||
import pytest | |||
import numpy as np | |||
import cupy as cp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to be added somehow to the test requirements and this step should be decoratred with a skipif(device)
Can you check if #2171 fixes the install issue |
@mloubout, definitely, we will test it. BTW, this docker image worked too, thanks to you. FROM nvcr.io/nvidia/nvhpc:23.5-devel-cuda12.1-ubuntu22.04 AS devel
RUN apt-get update -y && \
DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
git \
make \
wget && \
rm -rf /var/lib/apt/lists/*
RUN apt-get update -y && \
DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
python3 \
python3-dev \
python3-pip \
python3-setuptools \
python3-wheel \
python3.10-venv && \
rm -rf /var/lib/apt/lists/*
RUN pip3 --no-cache-dir install cupy-cuda12x |
Ok the nvidia setup has been updated so please:
|
def initialize(cls): | ||
|
||
try: | ||
import cupy as cp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move try import
for cupy at the top
try:
import cupy as cp
except importerror:
cp = None
and just check if None here.
MPI should just be imported at the top from devito.mpi
@@ -1473,6 +1475,55 @@ def test_gather_time_function(self): | |||
assert ans == np.array(None) | |||
|
|||
|
|||
class TestAllocators(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to test_gpu_common
as TestCupyAllocator
with an nividia device skip so it's added to GPU CI
Hello everyone,
We of SENAI CIMATEC are working in a Unified Memory Allocator to Devito trough CuPy library.
The first results using this new allocator have impressive results when using checkpointing compared to default allocator in GPU.
The impact of performance in our experiment using Overthrust (894x884x299) are close to three times* compared to default allocator in Devito.
With this approach we expect to be able to allocate memory beyond the GPU capacity in the future.
We will open a Draft PR to standardize this allocator with Devito patterns, fix possible bugs and open to community use :)
A version enabling this through External Allocator is in development too and we expect share this soon.
All feedbacks are welcome :)
Thank you all CIMATEC and Devito team, to make this possible 🙂
*All experiments were using Nvidia V100 with 32 GB of memory.