Skip to content

Conversation

PearCoding
Copy link
Contributor

Add asynchronous copy operation anydsl_copy_async.

The "async" is only a hint and only works on CUDA and OpenCL. Did not find a suitable method for HSA.
CPU could have async, but usually the host is handled as a single unit without async capabilities, therefore it was not added intentionally.

Tested with Rodent (Artic).

@Hugobros3
Copy link
Contributor

If the copy is asynchronous, how do you know it's finished ? Device-wide barrier ?

@PearCoding
Copy link
Contributor Author

Yes. Unfortunately, there is no access to streams or other finer-grade barriers in the API. Having a common set between all the device types we support is quite difficult. Especially because of OpenCL. :/

If you have an idea for finer-grade barriers, feel free to mention it. I am very interested in that :D

@richardmembarth
Copy link
Member

richardmembarth commented Sep 13, 2023

For HSA, you can use hsa_amd_memory_async_copy on AMD GPUs.

@PearCoding
Copy link
Contributor Author

The hsa function requires signals (which might be useful for events [other PR]). What would be the best practice to provide them for each call without exposing it to the AnyDSL user? Having a platform / device specific list of current signals?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants