-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recursive + GPU seems broken #548
Comments
Exploring this issue, I see the following behavior:
For POTRF(0) that works, more or less, because the data was on the CPU and the CUDA task is the one moving it on the GPU. But then, SYRK(0, 1) executes on the CUDA device and acquires A(1, 1) on the GPU. When POTRF(1) is tried, we do like for POTRF(0), and in particular, while testing for the RECURSIVE device, we transfer the ownership to device 0, without pulling the data up to the CPU. Then we return NEXT. When the CUDA POTRF(1) task is scheduled, it finds that it needs to do a transfer CPU -> GPU because the CPU is supposed to be the owner of the data (because of the transfer_ownership). I believe that last time we tried the RECURSIVE device, we were running only the GEMM tasks on the GPU. I'm not sure how to solve it though.... I don't think the hook task of RECURSIVE should acquire the data before it decides if it's going to split the task in a DAG... And even when it does so, it's sub-optimal to acquire the data on the CPU... Maybe we want to schedule the operation at a finer grain on the GPU.... And in any case, if it acquires the data on the CPU, it should move the data from GPU to CPU. |
We looked at the code during the code review meeting, and we found a series of issues that need addressing:
Actionable item: meanwhile, recursive device will be disabled for POTRF, which is the one operation that is susceptible to these issues (because all kernels now execute on GPU. That also makes the recursive device less attractive for that operation). |
…RECURSIVE device in test mode.
Line 55 in 5306015
Doesn't this line force recursive taskpools to execute on the CPU? I'm not entirely surprised that there are issues with recursive + GPU, since I don't think it was ever a supported configuration. |
The title might not reflect it clearly, but this is not what the issue is. The issue here is that recursive will indeed run only on the CPU, but it inherits a data that might be located on the GPU and it might fail to transfer it before unfolding the recursive algorithm. |
See issue ICLDisco#548 Signed-off-by: Aurelien Bouteiller <[email protected]>
See issue #548 Signed-off-by: Aurelien Bouteiller <[email protected]>
Describe the bug
An assert triggers when running a PTG test that combines recursive and CUDA bodies.
To Reproduce
Steps to reproduce the behavior:
Additional context
I tracked the issue, suspecting a problem with versioning from Aurelien's report, but it appears that we run first the CUDA kernel of POTRF(1), then we proceed to run the RECURSIVE kernel of POTRF(1) (which messes up the status of the CUDA copies, leading to unexpected issues raised by the assert).
Followup tasks
The text was updated successfully, but these errors were encountered: