-
Notifications
You must be signed in to change notification settings - Fork 455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use LifoQueue for turbomind async_stream_infer #1179
Conversation
* put queue as an argument to the function * use asyncio.Queue * lifoque for async stream infer * lifoque for turbomind * remove que pop * recover stream_infer * fix repeated yield
Hi @AllentDan If we use LifoQueue rather than asyncio.LifoQueue, does the issue of Python's Queue blocking coroutines still exist? #1138 (comment) |
The previous PR is to fix a thread-safety bug. The import os
import sys
from tqdm import tqdm
from concurrent.futures import ThreadPoolExecutor
from lmdeploy.serve.openai.api_client import APIClient
questions = ['你是谁'] * 1000
num_parallel = 512
def process_one(question, url='0.0.0.0', port='23333'):
client = APIClient('http://{}:{}'.format(url, port))
model_name = client.available_models[0]
msg = [dict(role='user', content=question)]
data = client.chat_completions_v1(model=model_name, messages=msg)
for item in data:
response = item
return response
with ThreadPoolExecutor(max_workers=num_parallel) as executor:
for response in tqdm(executor.map(process_one, questions)):
print(response) That last response is not pushed to the queue as the last item. Somehow, another response (finish=False) in lmdeploy/lmdeploy/turbomind/turbomind.py Line 732 in 24ea5dc
Consequently, the loop will never be ended since there is no expected last response (finish=True) that will be gotten by the consumer thread. This is the reason that for the test script, there is a request that will never be finished. And there is always a thread looping in lmdeploy/lmdeploy/turbomind/turbomind.py Line 728 in 24ea5dc
|
As for blocking, I tested the performance. It is the same whether or not we use |
https://stackoverflow.com/questions/32889527/is-there-a-way-to-use-asyncio-queue-in-multiple-threads asyncio.LifoQueue 好像加上queue._loop._write_to_self() 就可以了 |
Yes, I tested the method and it worked. However, the benchmark result is comparatively worse than without |
我这边测restful_api,结果都差不多。 |
那得再对下,如果能用 asyncio 的队列,优先用。 |
试了下 from tqdm import tqdm
from concurrent.futures import ThreadPoolExecutor
from lmdeploy.serve.openai.api_client import APIClient
questions = ['你是谁'] * 1000
num_parallel = 1000
def process_one(question, url='0.0.0.0', port='23333'):
client = APIClient('http://{}:{}'.format(url, port))
model_name = client.available_models[0]
msg = [dict(role='user', content=question)]
data = client.chat_completions_v1(model=model_name, messages=msg)
for item in data:
response = item
return response
with ThreadPoolExecutor(max_workers=num_parallel) as executor:
for response in tqdm(executor.map(process_one, questions)):
print(response) |
@zhulinJulia24 请把高并发测试加入到测试用例中 |
docker 镜像里面用root,或者非 root 用户减少 num_parallel 到256,不会出现 |
我很好奇为什么只有在高并发的api server模式下才观察到这个问题?按理说使用pipeline进行大批量的推理也应该有几率受到这个bug的影响。 |
#1138 Introduced
asyncio.Lifoque
while it is not thread-safe for multi-thread function likeforward_callback
orforward_thread
.Use normal
LifoQueue
to avoid frequently pop-out items that may pop out the final result.