-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Issue]: Assertion Error: "Can't claim the queue is finished with the active batch!" #148
Labels
Comments
Hi @Line-fr. Internal ticket has been created to investigate this issue. Thanks! |
As an update, I now have ROCm version
If any additional system info is desired, let me know And
|
rahulc1984
pushed a commit
that referenced
this issue
Apr 12, 2025
* SWDEV-517078 - Maintain the trap handler ABI version in CLR The trap handler ABI version is communicated to the debugger using the r_version field in the r_debug structure. This structure is an external dependency, which makes it complicated to keep the trap handler source (in CRL) and the ABI version number (external dependency) in sync. This patch proposes to patch the trap handler ABI version number in _amdgpu_r_debug before communicating it to the debugger. We can't directly include sc's executable.hpp file in CRL as it relies on conflicting definition of ELF related types, so instead we need to rely on a-priori knowledge on the r_debug structure. Fortunately, this structure is part of a stable ABI, so its layout is guaranteed to be kept stable. Update the 2nd level trap handler to follow updates from the ROCr-runtime. The trap handlers are stripped from parts dedicated to architectures unsupported by CLR. Bump the r_debug.r_version to track the ABI changes in the trap handler.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Problem Description
I am here to report an error that @mikesulsenti has been having consistently
I already saw this error randomly with another user. Here is the error:
There is some context to this which I will give. The error happen while using
https://codeberg.org/Kosaka/ssimulacrapy
with backend Vship
https://github.com/Line-fr/Vship
which itself uses vapoursynth. The issue then clearly arise in vship, in the part that computes SSIMU2 (check src/ssimu2/main.hpp)
as the dev of Vship, I can provide a bit more detail about the inner working to help solve the issue:
Vapoursynth will launch multiples threads, each thread will get an hipStream_t associated to it
Vship will launch every command as async except hipmalloc, hipfree and an event synchronization at the end to retrieve the score for a given frame.
I believe this issue is related to the stream managment but I am not really knowledgeable about the internals of ROCm.
I am affraid I cannot do much in my code to clear that issue, even more since it never happened to me myself and it is the first time we have been able to get this error in a consistent way.
I Hope that this issue will be useful.
Thank you for your time and the nice job!
Operating System
CachyOS Linux
CPU
AMD Ryzen 5 7600X 6-Core Processor
GPU
AMD Radeon RX 7900 XTX
ROCm Version
6.3.2-2
ROCm Component
clr
Steps to Reproduce
I do not know how to reproduce it sadly
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response
The text was updated successfully, but these errors were encountered: