Skip to content
This repository has been archived by the owner on Mar 20, 2023. It is now read-only.

Improve OpenMP offload implementation #729

Open
olupton opened this issue Dec 23, 2021 · 0 comments
Open

Improve OpenMP offload implementation #729

olupton opened this issue Dec 23, 2021 · 0 comments
Labels
gpu improvement Improvement over existing implementation

Comments

@olupton
Copy link
Contributor

olupton commented Dec 23, 2021

Overview
In #713 we have added support for GPU offload using OpenMP. This is a good first step, but there are several areas where we hope to improve the implementation. This issue is to track planned improvements.

Asynchronous execution
In #713 we did not include any asynchronous execution clauses for OpenMP-based accelerator offload (nowait, depend, taskwait). This was partly for simplicity, and partly because support for those clauses in the compiler we were using at the time (NVHPC 21.9) is rather limited.

Work has already started on this, see:

Initially we should aim to recover the performance attained with (asynchronous) OpenACC.
After that, we could look at launching more mechanism kernels in parallel within a single NrnThread.

Present clauses
With OpenACC we had present(...) clauses that allowed us to assert that data were already present on the device and should not be copied. The current OpenMP implementation has no such equivalent, but we basically preserve the same data transfer pattern as OpenACC because we ensure that the data are already present.

In principle a bug in the model transfer code (leading to some relevant data not being transferred to the device during initialisation) would cause a runtime error with OpenACC (✅) and implicit data transfers with OpenMP (⛔). Given that we already know how to generate present() clauses, it seems desirable to add the OpenMP equivalent (map(present, alloc: ...)) once it is widely supported.

(original issue: neuronsimulator/gpuhackathon#5)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
gpu improvement Improvement over existing implementation
Projects
None yet
Development

No branches or pull requests

1 participant