Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GptManager occupies the entire CPU #917

Closed
Masterlk opened this issue Jan 19, 2024 · 10 comments
Closed

GptManager occupies the entire CPU #917

Masterlk opened this issue Jan 19, 2024 · 10 comments
Assignees
Labels
stale triaged Issue has been triaged by maintainers

Comments

@Masterlk
Copy link

Masterlk commented Jan 19, 2024

I use pybings of tensorrt-llm, and try with GptManager for InflightBatching feature. but it's seems the initialization of GptManager occupies the entire CPU, and it hasn't even accepted a request yet.

My expectation is that when there are no requests, the CPU utilization should be 0. so what's the possible reason?

I use the main branch, the newest version, and this code can reproduce the problem:

import argparse
from asyncio import run
from pathlib import Path
import asyncio
from tensorrt_llm.engine import AsyncLLMEngine

async def main(model_dir, tokenizer_dir):
    engine = AsyncLLMEngine(model_dir, tokenizer_dir)
    await asyncio.sleep(100)


if __name__ == "__main__":
    model_dir = "./1-gpu"
    tokenizer_dir = "../module/mistral/data/tokenizer"
    run(main(model_dir, tokenizer_dir))

image

@byshiue
Copy link
Collaborator

byshiue commented Jan 19, 2024

Does the program occupy all threads or only single thread? GptManager will create a thread to wait and collect the requests.

@byshiue byshiue self-assigned this Jan 19, 2024
@byshiue byshiue added the triaged Issue has been triaged by maintainers label Jan 19, 2024
@Masterlk
Copy link
Author

Does the program occupy all threads or only single thread? GptManager will create a thread to wait and collect the requests.

I use top -Hp xxx to see all threads of the process, and found a single thread occupy the cpu to 100%

@byshiue
Copy link
Collaborator

byshiue commented Jan 19, 2024

Then I think it is expected.

@Masterlk
Copy link
Author

Then I think it is expected.

If the thread only wait and collect the requests, then why it occupy the entire cpu core ? Theoretically, if no requests come in, waiting should not occupy CPU resources

@byshiue
Copy link
Collaborator

byshiue commented Jan 22, 2024

It uses a while loop to wait the request coming.

@lyc728
Copy link

lyc728 commented Jan 23, 2024

pybings

import argparse
from asyncio import run
from pathlib import Path
import asyncio
from tensorrt_llm.engine import AsyncLLMEngine

async def main(model_dir, tokenizer_dir):
engine = AsyncLLMEngine(model_dir, tokenizer_dir)
await asyncio.sleep(100)

if name == "main":
model_dir = "./1-gpu"
tokenizer_dir = "../module/mistral/data/tokenizer"
run(main(model_dir, tokenizer_dir))

Hello, do you have the complete code? I don't see InflightBatching implemented by tensorrt-llm. May I ask where is your code?

@HUSTHY
Copy link

HUSTHY commented Jun 25, 2024

pybings

import argparse from asyncio import run from pathlib import Path import asyncio from tensorrt_llm.engine import AsyncLLMEngine

async def main(model_dir, tokenizer_dir): engine = AsyncLLMEngine(model_dir, tokenizer_dir) await asyncio.sleep(100)

if name == "main": model_dir = "./1-gpu" tokenizer_dir = "../module/mistral/data/tokenizer" run(main(model_dir, tokenizer_dir))

Hello, do you have the complete code? I don't see InflightBatching implemented by tensorrt-llm. May I ask where is your code?

the InflightBatching is implemented when built trt engine and for AsyncLLMEngine use

@nv-guomingz
Copy link
Collaborator

Hi @Masterlk do u still have further issue or question now? If not, we'll close it soon.

@HUSTHY
Copy link

HUSTHY commented Nov 15, 2024 via email

@HUSTHY
Copy link

HUSTHY commented Dec 4, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

5 participants