Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] topk is larger #441

Closed
2 tasks
hatrexltd opened this issue Sep 20, 2023 · 13 comments
Closed
2 tasks

[Bug] topk is larger #441

hatrexltd opened this issue Sep 20, 2023 · 13 comments

Comments

@hatrexltd
Copy link

hatrexltd commented Sep 20, 2023

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.

Describe the bug

When there are too many simultaneous requests, errors occur and the server crashes. How to fix this problem ?

Reproduction

1|triton | [WARNING] topk (471476112) is larger than max supported number (1024) for token 5 clip to max supported number 1024.
1|triton | [WARNING] topk (471476112) is larger than max supported number (1024) for token 5 clip to max supported number 1024.
1|triton | [WARNING] topk (471476112) is larger than max supported number (1024) for token 5 clip to max supported number 1024.
1|triton | [WARNING] topk (471476112) is larger than max supported number (1024) for token 5 clip to max supported number 1024.
1|triton | [WARNING] topk (471476112) is larger than max supported number (1024) for token 5 clip to max supported number 1024.
1|triton | [WARNING] topk (471476112) is larger than max supported number (1024) for token 5 clip to max supported number 1024.
1|triton | [WARNING] topk (471476112) is larger than max supported number (1024) for token 5 clip to max supported number 1024. 1|triton | [WARNING] topk (471476112) is larger than max supported number (1024) for token 5 clip to max supported number 1024.
1|triton | [WARNING] topk (471476112) is larger than max supported number (1024) for token 5 clip to max supported number 1024.
1|triton | [WARNING] topk (471476112) is larger than max supported number (1024) for token 5 clip to max supported number 1024.
1|triton | [WARNING] topk (471476112) is larger than max supported number (1024) for token 5 clip to max supported number 1024.
1|triton | [WARNING] topk (471476112) is larger than max supported number (1024) for token 5 clip to max supported number 1024.
1|triton | [WARNING] topk (471476112) is larger than max supported number (1024) for token 5 clip to max supported number 1024.
1|triton | [WARNING] topk (471476112) is larger than max supported number (1024) for token 5 clip to max supported number 1024.
1|triton | what(): [TM][ERROR] CUDA runtime error: out of memory /lmdeploy/src/turbomind/utils/allocator.h:223
(24GB/80GB per GPU used)

Error traceback

No response

@hatrexltd hatrexltd changed the title [Bug] [Bug] topk is larger Sep 20, 2023
@lvhan028
Copy link
Collaborator

Could you share the reproduction method?

@hatrexltd
Copy link
Author

Could you share the reproduction method?

python3 -m lmdeploy.serve.openai.api_server ./workspace 0.0.0.0 10000 --instance_num 32 --tp 8
Model : Llama 2 (70B)
To reproduce the bug, you have to send lots of requests (5-20 per second)

@lvhan028
Copy link
Collaborator

Which version are you using?
If you are using v0.0.7 or v0.0.8, I suggest upgrading to v0.0.9

@hatrexltd
Copy link
Author

Even with v0.0.9, I still experience crashes with the same error

@lvhan028
Copy link
Collaborator

Can you share your client code to help us reproduce it?

@hatrexltd
Copy link
Author

Can you share your client code to help us reproduce it?

Llama 70B HF
python3 -m lmdeploy.serve.openai.api_server ./workspace 0.0.0.0 43000 --instance_num 32 --tp 8
LMDeploy 0.0.9
No modifications

Lmdeploy only uses 25% of VRAM. Is it possible to make it use more, like for vllm to support more concurrent requests? I only have this problem when there are too many requests.

@hatrexltd
Copy link
Author

I will try this #460

@hatrexltd hatrexltd reopened this Sep 27, 2023
@hatrexltd
Copy link
Author

The problem is actually still present. It occurs when too many requests are made in a period of time, even after updating to LMDeploy 0.1.0

@hatrexltd
Copy link
Author

For example, if you make 5 concurrent requests over a period of 1s, you will get the error.

@lvhan028
Copy link
Collaborator

@AllentDan Could you help investigate it?

@AllentDan
Copy link
Collaborator

@hatrexltd Which api did you use? And how did you send the requests?

@github-actions
Copy link

github-actions bot commented Oct 6, 2023

This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 5 days if the stale label is not removed or if there is no further response.

@github-actions github-actions bot added the Stale label Oct 6, 2023
@github-actions
Copy link

This issue is closed because it has been stale for 5 days. Please open a new issue if you have similar issues or you have any new updates now.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants