-
Notifications
You must be signed in to change notification settings - Fork 445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] topk is larger #441
Comments
Could you share the reproduction method? |
python3 -m lmdeploy.serve.openai.api_server ./workspace 0.0.0.0 10000 --instance_num 32 --tp 8 |
Which version are you using? |
Even with v0.0.9, I still experience crashes with the same error |
Can you share your client code to help us reproduce it? |
Llama 70B HF Lmdeploy only uses 25% of VRAM. Is it possible to make it use more, like for vllm to support more concurrent requests? I only have this problem when there are too many requests. |
I will try this #460 |
The problem is actually still present. It occurs when too many requests are made in a period of time, even after updating to LMDeploy 0.1.0 |
For example, if you make 5 concurrent requests over a period of 1s, you will get the error. |
@AllentDan Could you help investigate it? |
@hatrexltd Which api did you use? And how did you send the requests? |
This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 5 days if the stale label is not removed or if there is no further response. |
This issue is closed because it has been stale for 5 days. Please open a new issue if you have similar issues or you have any new updates now. |
Checklist
Describe the bug
When there are too many simultaneous requests, errors occur and the server crashes. How to fix this problem ?
Reproduction
1|triton | [WARNING] topk (471476112) is larger than max supported number (1024) for token 5 clip to max supported number 1024.
1|triton | [WARNING] topk (471476112) is larger than max supported number (1024) for token 5 clip to max supported number 1024.
1|triton | [WARNING] topk (471476112) is larger than max supported number (1024) for token 5 clip to max supported number 1024.
1|triton | [WARNING] topk (471476112) is larger than max supported number (1024) for token 5 clip to max supported number 1024.
1|triton | [WARNING] topk (471476112) is larger than max supported number (1024) for token 5 clip to max supported number 1024.
1|triton | [WARNING] topk (471476112) is larger than max supported number (1024) for token 5 clip to max supported number 1024.
1|triton | [WARNING] topk (471476112) is larger than max supported number (1024) for token 5 clip to max supported number 1024. 1|triton | [WARNING] topk (471476112) is larger than max supported number (1024) for token 5 clip to max supported number 1024.
1|triton | [WARNING] topk (471476112) is larger than max supported number (1024) for token 5 clip to max supported number 1024.
1|triton | [WARNING] topk (471476112) is larger than max supported number (1024) for token 5 clip to max supported number 1024.
1|triton | [WARNING] topk (471476112) is larger than max supported number (1024) for token 5 clip to max supported number 1024.
1|triton | [WARNING] topk (471476112) is larger than max supported number (1024) for token 5 clip to max supported number 1024.
1|triton | [WARNING] topk (471476112) is larger than max supported number (1024) for token 5 clip to max supported number 1024.
1|triton | [WARNING] topk (471476112) is larger than max supported number (1024) for token 5 clip to max supported number 1024.
1|triton | what(): [TM][ERROR] CUDA runtime error: out of memory /lmdeploy/src/turbomind/utils/allocator.h:223
(24GB/80GB per GPU used)
Error traceback
No response
The text was updated successfully, but these errors were encountered: