-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GptManager occupies the entire CPU #917
Comments
Does the program occupy all threads or only single thread? GptManager will create a thread to wait and collect the requests. |
I use |
Then I think it is expected. |
If the thread only wait and collect the requests, then why it occupy the entire cpu core ? Theoretically, if no requests come in, waiting should not occupy CPU resources |
It uses a while loop to wait the request coming. |
import argparse async def main(model_dir, tokenizer_dir): if name == "main": Hello, do you have the complete code? I don't see InflightBatching implemented by tensorrt-llm. May I ask where is your code? |
the InflightBatching is implemented when built trt engine and for AsyncLLMEngine use |
Hi @Masterlk do u still have further issue or question now? If not, we'll close it soon. |
已收到!谢谢! ——黄洋
|
已收到!谢谢! ——黄洋
|
I use pybings of tensorrt-llm, and try with GptManager for InflightBatching feature. but it's seems the initialization of GptManager occupies the entire CPU, and it hasn't even accepted a request yet.
My expectation is that when there are no requests, the CPU utilization should be 0. so what's the possible reason?
I use the main branch, the newest version, and this code can reproduce the problem:
The text was updated successfully, but these errors were encountered: