-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error triggered by synchronize() #603
Comments
It means there's an exception that's triggered by one of the kernels you run. But you can try uncommenting them and running again to trigger the exception and see in details what's causing it. VectorAddLambda: Error During Test at /home/wfg/github-runners/cousteau-JACC/ci/_work/JACC.jl/JACC.jl/test/tests_amdgpu.jl:10
Got exception outside of a @test
GPU Kernel Exception
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:35
[2] throw_if_exception(dev::AMDGPU.HIP.HIPDevice)
@ AMDGPU ~/.julia/packages/AMDGPU/rrvsy/src/exception_handler.jl:122
[3] synchronize(stm::AMDGPU.HIP.HIPStream*** blocking::Bool, stop_hostcalls::Bool)
@ AMDGPU ~/.julia/packages/AMDGPU/rrvsy/src/highlevel.jl:53
[4] synchronize (repeats 2 times) |
@pxl-th thanks for the guidance, I will give it a try and report back. |
To make it easier, I've pushed a branch |
I think I'm missing something basic with synchronization.
When using a simple
@roc
kernel launch inside a function we get an error in this AMDGPU.synchronize() line. The stacktrace can be seen in our CI using a recent AMDGPU.jl v0.8.6 on a MI100 with rocm 6.I don't know if the first message in AMDGPU.jl in the stacktrace:
[4] synchronize (repeats 2 times) @ ~/.julia/packages/AMDGPU/rrvsy/src/highlevel.jl:49 [inlined]
provides any hints.Works:
Fails:
For reference the CUDA code works fine:
Any help would be appreciated!
The text was updated successfully, but these errors were encountered: