Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenMP #273

Open
wants to merge 38 commits into
base: master
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
337b2db
Format change.
isazi Aug 21, 2024
852f198
First very draft attempt at feature parity OpenACC/OpenMP.
isazi Aug 21, 2024
a2cee37
Added OpenMP tests.
isazi Aug 21, 2024
0067fd5
Fixed two failing tests.
isazi Aug 21, 2024
debcb64
A fix was necessary in the Compiler backend to have OpenMP working.
isazi Aug 21, 2024
bb95d7b
Fixed test.
isazi Aug 21, 2024
ba02b31
Fixing the examples.
isazi Aug 21, 2024
828d4c6
Matrix multiply example for OpenMP.
isazi Aug 21, 2024
24857cc
Adding correctness check.
isazi Aug 22, 2024
3d32c47
Refactor function.
isazi Aug 22, 2024
21e56af
Bug fixed.
isazi Aug 22, 2024
1f815e2
Adding missing parameter.
isazi Aug 22, 2024
ddf501f
Another bug fixed.
isazi Aug 22, 2024
1698a9b
Updated the OpenMP matrix multiply.
isazi Aug 22, 2024
2fffcf0
Update vector add OpenMP code.
isazi Sep 5, 2024
c0d240a
Using "restrict" is too compiler specific.
isazi Sep 5, 2024
62809fa
Reorder parameters.
isazi Sep 17, 2024
fbdadd7
Draft example of a histogram.
isazi Sep 17, 2024
d844e97
Bound the values inside the array.
isazi Sep 17, 2024
ab9d6b5
Typo.
isazi Sep 17, 2024
e7fd411
Fixing a bug in the correctness check.
isazi Sep 17, 2024
939f0c3
Use the cleaning observer.
isazi Sep 17, 2024
06c6074
Fixing what is (probably) a long standing bug, observers were ignored…
isazi Sep 17, 2024
9dc47a4
Fixing allocation bugs in the Compiler Backend.
isazi Sep 18, 2024
c7bc0c8
Move this trailing comment on the previous empty line.
isazi Sep 19, 2024
ee7432a
Fixing, for good, a bug that prevented cleaning up the output memory …
isazi Sep 19, 2024
e2c6a09
The example should now work.
isazi Sep 19, 2024
fb9033a
Fix the test to use the new method.
isazi Sep 19, 2024
f9808e1
Some refactoring necessary for everything to work. The Compiler backe…
isazi Sep 19, 2024
2333916
Remove old tests.
isazi Sep 19, 2024
aa3cadb
Added test for the compiler memory refresh.
isazi Sep 19, 2024
4587539
Update the test.
isazi Sep 19, 2024
4c77414
Although semantically there is no dtoh copy in the compiler backend, …
isazi Sep 19, 2024
da2fe05
Test fixed.
isazi Sep 19, 2024
47898d9
Added tunable parameters to the example.
isazi Sep 19, 2024
760462c
OpenMP version of the histogram example.
isazi Sep 19, 2024
b207e06
Merge branch 'master' into directives
isazi Nov 28, 2024
69ec5ac
Merge branch 'master' into directives
isazi Dec 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Fixing, for good, a bug that prevented cleaning up the output memory …
…when using the compiler backend.
isazi committed Sep 19, 2024
commit ee7432a8d7e20e1034eced45ab46553f9311f493
11 changes: 11 additions & 0 deletions kernel_tuner/backends/backend.py
Original file line number Diff line number Diff line change
@@ -57,6 +57,12 @@ def memcpy_htod(self, dest, src):
"""This method must implement a host to device copy."""
pass

def reset(self, arguments, should_sync):
"""Copy the original content of the output memory to device memory."""
for i, arg in enumerate(arguments):
if should_sync[i]:
self.memcpy_htod(self.allocations[i], arg)


class GPUBackend(Backend):
"""Base class for GPU backends"""
@@ -87,3 +93,8 @@ class CompilerBackend(Backend):
@abstractmethod
def __init__(self, iterations, compiler_options, compiler):
pass

@abstractmethod
def cleanup_lib(self):
"""Unload the previously loaded shared library"""
pass
16 changes: 9 additions & 7 deletions kernel_tuner/backends/compiler.py
Original file line number Diff line number Diff line change
@@ -81,6 +81,7 @@ def __init__(self, iterations=7, compiler_options=None, compiler=None, observers
:param iterations: Number of iterations used while benchmarking a kernel, 7 by default.
:type iterations: int
"""
# allocations contains a clean copy of the memory
self.allocations = []
self.observers = observers or []
self.observers.append(CompilerRuntimeObserver(self))
@@ -151,11 +152,6 @@ def ready_argument_list(self, arguments):
dtype_str = str(arg.dtype)
if isinstance(arg, np.ndarray):
if dtype_str in dtype_map.keys():
# In numpy <= 1.15, ndarray.ctypes.data_as does not itself keep a reference
# to its underlying array, so we need to store a reference to arg.copy()
# in the Argument object manually to avoid it being deleted.
# (This changed in numpy > 1.15.)
# data_ctypes = data.ctypes.data_as(C.POINTER(dtype_map[dtype_str]))
data_ctypes = arg.ctypes.data_as(C.POINTER(dtype_map[dtype_str]))
else:
raise TypeError("unknown dtype for ndarray")
@@ -164,7 +160,7 @@ def ready_argument_list(self, arguments):
elif is_cupy_array(arg):
data_ctypes = C.c_void_p(arg.data.ptr)
ctype_args[i] = Argument(numpy=arg, ctypes=data_ctypes)
self.allocations.append(ctype_args[i])
self.allocations.append(Argument(numpy=arg.copy(), ctypes=data_ctypes))
return ctype_args

def compile(self, kernel_instance):
@@ -393,8 +389,14 @@ def memcpy_htod(self, dest, src):
xp = get_array_module(dest.numpy)
dest.numpy[:] = xp.asarray(value)

def reset(self, arguments, should_sync):
"""Copy the preserved content of the output memory to device pointers."""
for i, arg in enumerate(arguments):
if should_sync[i]:
self.memcpy_htod(arg, self.allocations[i])

def cleanup_lib(self):
"""unload the previously loaded shared library"""
"""Unload the previously loaded shared library"""
if not self.using_openmp and not self.using_openacc:
# this if statement is necessary because shared libraries that use
# OpenMP will core dump when unloaded, this is a well-known issue with OpenMP
7 changes: 3 additions & 4 deletions kernel_tuner/core.py
Original file line number Diff line number Diff line change
@@ -474,14 +474,13 @@ def check_kernel_output(self, func, gpu_args, instance, answer, atol, verify, ve

# re-copy original contents of output arguments to GPU memory, to overwrite any changes
# by earlier kernel runs
for i, arg in enumerate(instance.arguments):
if should_sync[i]:
self.dev.memcpy_htod(gpu_args[i], arg)
self.dev.reset(instance.arguments, should_sync)

# run the kernel
check = self.run_kernel(func, gpu_args, instance)
if not check:
return # runtime failure occured that should be ignored, skip correctness check
# runtime failure occured that should be ignored, skip correctness check
return

# retrieve gpu results to host memory
result_host = []