-
Notifications
You must be signed in to change notification settings - Fork 449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] No enough blocks. Assertion fail: /root/lmdeploy/src/turbomind/models/llama/LlamaBatch.cc:358 #720
Comments
Hope this guide can help you |
Thank you for your reply. I've read the guide, but I'm still confused.
|
cache_block_seq_len: 128 |
I will try to reproduce this issue tomorrow |
This seems to be an issue introduced in #590, you may try to use the latest main branch to see if it helps. |
Problem resolved after I updated the code to the latest main branch. |
Checklist
Describe the bug
This codump happens when :
max_batch_size/cache_max_entry_count =1
in config.iniconcurrency = 1
in benchmark scriptsmax_batch_size/cache_max_entry_count
is large andconcurrency
is smallwhen concurrency=1, This config failed:
but this config run successfully:
Reproduction
Environment
lmdeploy Version: 0.0.14 cuda version: 11.8 os: centos 7 git log: commit `73386e217cc092e02a5c256e8cf6ce1475b8c6c8`
Error traceback
The text was updated successfully, but these errors were encountered: