-
Notifications
You must be signed in to change notification settings - Fork 14.7k
[Offload] Make olLaunchKernel test thread safe #149497
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
1c4875c
220c584
94d06eb
013bc8a
c13190e
6bbb236
8989e58
bf572d4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -522,16 +522,11 @@ struct CUDADeviceTy : public GenericDeviceTy { | |
|
||
/// Get the stream of the asynchronous info structure or get a new one. | ||
Error getStream(AsyncInfoWrapperTy &AsyncInfoWrapper, CUstream &Stream) { | ||
// Get the stream (if any) from the async info. | ||
Stream = AsyncInfoWrapper.getQueueAs<CUstream>(); | ||
if (!Stream) { | ||
// There was no stream; get an idle one. | ||
if (auto Err = CUDAStreamManager.getResource(Stream)) | ||
return Err; | ||
|
||
// Modify the async info's stream. | ||
AsyncInfoWrapper.setQueueAs<CUstream>(Stream); | ||
} | ||
auto WrapperStream = | ||
AsyncInfoWrapper.getOrInitQueue<CUstream>(CUDAStreamManager); | ||
if (!WrapperStream) | ||
return WrapperStream.takeError(); | ||
Stream = *WrapperStream; | ||
return Plugin::success(); | ||
} | ||
|
||
|
@@ -642,17 +637,20 @@ struct CUDADeviceTy : public GenericDeviceTy { | |
} | ||
|
||
/// Synchronize current thread with the pending operations on the async info. | ||
Error synchronizeImpl(__tgt_async_info &AsyncInfo) override { | ||
Error synchronizeImpl(__tgt_async_info &AsyncInfo, | ||
bool ReleaseQueue) override { | ||
CUstream Stream = reinterpret_cast<CUstream>(AsyncInfo.Queue); | ||
CUresult Res; | ||
Res = cuStreamSynchronize(Stream); | ||
|
||
// Once the stream is synchronized, return it to stream pool and reset | ||
// AsyncInfo. This is to make sure the synchronization only works for its | ||
// own tasks. | ||
AsyncInfo.Queue = nullptr; | ||
if (auto Err = CUDAStreamManager.returnResource(Stream)) | ||
return Err; | ||
// Once the stream is synchronized and we want to release the queue, return | ||
// it to stream pool and reset AsyncInfo. This is to make sure the | ||
// synchronization only works for its own tasks. | ||
if (ReleaseQueue) { | ||
AsyncInfo.Queue = nullptr; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When does the queue gets unset/released for liboffload queues? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When the device is de-inited, all streams in the stream manager are deinited and dropped. For liboffload specifically, since devices are not cleared, this happens during the final liboffload There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But shouldn't the queue be released when There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it makes sense to do that, but most of the "destroy" functions are not implemented fully yet and just leak memory. |
||
if (auto Err = CUDAStreamManager.returnResource(Stream)) | ||
return Err; | ||
} | ||
|
||
return Plugin::check(Res, "error in cuStreamSynchronize: %s"); | ||
} | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -104,6 +104,29 @@ TEST_P(olLaunchKernelFooTest, Success) { | |
ASSERT_SUCCESS(olMemFree(Mem)); | ||
} | ||
|
||
TEST_P(olLaunchKernelFooTest, SuccessThreaded) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd love to be able to add an |
||
threadify([&](size_t) { | ||
void *Mem; | ||
ASSERT_SUCCESS(olMemAlloc(Device, OL_ALLOC_TYPE_MANAGED, | ||
LaunchArgs.GroupSize.x * sizeof(uint32_t), &Mem)); | ||
struct { | ||
void *Mem; | ||
} Args{Mem}; | ||
|
||
ASSERT_SUCCESS(olLaunchKernel(Queue, Device, Kernel, &Args, sizeof(Args), | ||
&LaunchArgs, nullptr)); | ||
|
||
ASSERT_SUCCESS(olWaitQueue(Queue)); | ||
|
||
uint32_t *Data = (uint32_t *)Mem; | ||
for (uint32_t i = 0; i < 64; i++) { | ||
ASSERT_EQ(Data[i], i); | ||
} | ||
|
||
ASSERT_SUCCESS(olMemFree(Mem)); | ||
}); | ||
} | ||
|
||
TEST_P(olLaunchKernelNoArgsTest, Success) { | ||
ASSERT_SUCCESS( | ||
olLaunchKernel(Queue, Device, Kernel, nullptr, 0, &LaunchArgs)); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please indicate with a comment what's the
false
doing.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code assumes other threads will not release the queue from that async info, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, although as far as I know, liboffload doesn't do that, and that feels reasonable as a thing to mark as undefined.