Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

批量推理api,支持高并发 #1244

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Lukangkang123
Copy link

@Lukangkang123 Lukangkang123 commented Jun 15, 2023

修改自evaluate.py,支持多线程接受请求,设计了个请求池,当积累到一定数量(MAX_BATCH_SIZE)或等待到一定时间(MAX_WAIT_TIME)后,可以执行批量推理,大大加快了推理速度。可以根据需要设置MAX_BATCH_SIZE和MAX_WAIT_TIME这两个超参数

@HL0718
Copy link

HL0718 commented Jun 15, 2023

修改自evaluate.py,支持多线程接受请求,设计了个请求池,当积累到一定数量(MAX_BATCH_SIZE)或等待到一定时间(MAX_WAIT_TIME)后,可以执行批量推理,大大加快了推理速度。可以根据需要设置MAX_BATCH_SIZE和MAX_WAIT_TIME这两个超参数

请问一下,您使用的显卡型号和显存大小是啥,这样不会报显存吗?

@Lukangkang123
Copy link
Author

HL0718

应该不会的,我用的是80G的A800。可以把batch调到300左右。你可以根据你的显卡来调整MAX_BATCH_SIZE,显存小你就调小些。

PS: 虽然这种方式也会让显存增加,但是推理速度有极大提升,所以收益是明显的。我用相同的一千条数据测过,这种分batch的方式只需要30秒就可以推理完成(batch=100),而完全串行需要500秒左右。分batch比串行的显存只增加了数倍。

@HL0718
Copy link

HL0718 commented Jun 15, 2023

HL0718

应该不会的,我用的是80G的A800。可以把batch调到300左右。你可以根据你的显卡来调整MAX_BATCH_SIZE,显存小你就调小些。

PS: 虽然这种方式也会让显存增加,但是推理速度有极大提升,所以收益是明显的。我用相同的一千条数据测过,这种分batch的方式只需要30秒就可以推理完成(batch=100),而完全串行需要500秒左右。分batch比串行的显存只增加了数倍。

您每个batch里面的每天数据的长度应该不会很长,我试了一下,当长度为2048时,80G显存的A100最多支持batch为32

@Lukangkang123
Copy link
Author

HL0718

应该不会的,我用的是80G的A800。可以把batch调到300左右。你可以根据你的显卡来调整MAX_BATCH_SIZE,显存小你就调小些。
PS: 虽然这种方式也会让显存增加,但是推理速度有极大提升,所以收益是明显的。我用相同的一千条数据测过,这种分batch的方式只需要30秒就可以推理完成(batch=100),而完全串行需要500秒左右。分batch比串行的显存只增加了数倍。

您每个batch里面的每天数据的长度应该不会很长,我试了一下,当长度为2048时,80G显存的A100最多支持batch为32

我的数据确实不长。但是你这么长的数据,完全串行更费时间吧,你可以比较一下跟串行相比的速度有无提升。

@HL0718
Copy link

HL0718 commented Jun 15, 2023

我的数据确实不长。但是你这么长的数据,完全串行更费时间吧,你可以比较一下跟串行相比的速度有无提升。

这个确实是有提升的,串行的时间大概是80s左右,batch的形式的话大概就是18s左右

@Lukangkang123
Copy link
Author

我的数据确实不长。但是你这么长的数据,完全串行更费时间吧,你可以比较一下跟串行相比的速度有无提升。

这个确实是有提升的,串行的时间大概是80s左右,batch的形式的话大概就是18s左右

对啊,说明分batch这种方式还是有效果的,请问您是用我提供的代码运行的吗?

@HL0718
Copy link

HL0718 commented Jun 15, 2023

我的数据确实不长。但是你这么长的数据,完全串行更费时间吧,你可以比较一下跟串行相比的速度有无提升。

这个确实是有提升的,串行的时间大概是80s左右,batch的形式的话大概就是18s左右

对啊,说明分batch这种方式还是有效果的,请问您是用我提供的代码运行的吗?

目前没有,目前暂时没有这种高并发的场景要求

@liwei0826
Copy link

多个请求会报错,RuntimeError: Task <Task pending name='Task-7' coro=<RequestResponseCycle.run_asgi() running at /home/nrp/anaconda3/lib/python3.9/site-packages/uvicorn/protocols/http/httptools_impl.py:436> cb=[set.discard()]> got Future attached to a different loop

@Lukangkang123
Copy link
Author

多个请求会报错,RuntimeError: Task <Task pending name='Task-7' coro=<RequestResponseCycle.run_asgi() running at /home/nrp/anaconda3/lib/python3.9/site-packages/uvicorn/protocols/http/httptools_impl.py:436> cb=[set.discard()]> got Future attached to a different loop

请求您服务器端是用哪种方式部署的?我这套代码,只需要用fastAPI部署就行了

@976311200
Copy link

我的显存只有16g,目前堪堪部署fp16的模型,如果我替换成int4的权重,再配上您的代码是不是可以测试一下这个高并发能力了

@Lukangkang123
Copy link
Author

我的显存只有16g,目前堪堪部署fp16的模型,如果我替换成int4的权重,再配上您的代码是不是可以测试一下这个高并发能力了

可以试一下,但是batch应该不能开太大

@liwei0826
Copy link

就是用的fastapi,用的你得代码,同时两个页面进行请求,就出现了上面的错误

@Lukangkang123
Copy link
Author

就是用的fastapi,用的你得代码,同时两个页面进行请求,就出现了上面的错误

我这边多线程测试是没问题的。您可以把完整报错信息发来我看一下

@runningBolin
Copy link

我是用chatglm2-6b模型,批量推理的回复效果会变差有没有遇到这个问题?

@HongyuJiang
Copy link

小建议:可以根据batch中文本的平均长度或者总长度以及可用的显存大小来分配batch size

@Justin18Chan
Copy link

Justin18Chan commented Jul 17, 2023

ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py", line 428, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in call
return await self.app(scope, receive, send)
File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/fastapi/applications.py", line 282, in call
await super().call(scope, receive, send)
File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/starlette/applications.py", line 122, in call
await self.middleware_stack(scope, receive, send)
File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/starlette/middleware/errors.py", line 184, in call
raise exc
File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/starlette/middleware/errors.py", line 162, in call
await self.app(scope, receive, _send)
File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 79, in call
raise exc
File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 68, in call
await self.app(scope, receive, sender)
File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 20, in call
raise e
File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 17, in call
await self.app(scope, receive, send)
File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/starlette/routing.py", line 718, in call
await route.handle(scope, receive, send)
File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/starlette/routing.py", line 276, in handle
await self.app(scope, receive, send)
File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/starlette/routing.py", line 66, in app
response = await func(request)
File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/fastapi/routing.py", line 241, in app
raw_response = await run_endpoint_function(
File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/fastapi/routing.py", line 167, in run_endpoint_function
return await dependant.call(**values)
File "/root/chatglm/ChatGLM-6B/ptuning/api_batch.py", line 97, in handle_data
return await data_processor.process_data(prompt)
File "/root/chatglm/ChatGLM-6B/ptuning/api_batch.py", line 80, in process_data
await self.wait_for_result(data)
File "/root/chatglm/ChatGLM-6B/ptuning/api_batch.py", line 66, in wait_for_result
await self.event.wait()
File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/asyncio/locks.py", line 226, in wait
await fut
RuntimeError: Task <Task pending name='Task-8' coro=<RequestResponseCycle.run_asgi() running at /root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py:428> cb=[set.discard()]> got Future attached to a different loop

发起多个请求返回报错

@Lukangkang123
Copy link
Author

ERROR: Exception in ASGI application Traceback (most recent call last): File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py", line 428, in run_asgi result = await app( # type: ignore[func-returns-value] File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in call return await self.app(scope, receive, send) File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/fastapi/applications.py", line 282, in call await super().call(scope, receive, send) File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/starlette/applications.py", line 122, in call await self.middleware_stack(scope, receive, send) File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/starlette/middleware/errors.py", line 184, in call raise exc File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/starlette/middleware/errors.py", line 162, in call await self.app(scope, receive, _send) File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 79, in call raise exc File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 68, in call await self.app(scope, receive, sender) File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 20, in call raise e File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 17, in call await self.app(scope, receive, send) File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/starlette/routing.py", line 718, in call await route.handle(scope, receive, send) File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/starlette/routing.py", line 276, in handle await self.app(scope, receive, send) File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/starlette/routing.py", line 66, in app response = await func(request) File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/fastapi/routing.py", line 241, in app raw_response = await run_endpoint_function( File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/fastapi/routing.py", line 167, in run_endpoint_function return await dependant.call(**values) File "/root/chatglm/ChatGLM-6B/ptuning/api_batch.py", line 97, in handle_data return await data_processor.process_data(prompt) File "/root/chatglm/ChatGLM-6B/ptuning/api_batch.py", line 80, in process_data await self.wait_for_result(data) File "/root/chatglm/ChatGLM-6B/ptuning/api_batch.py", line 66, in wait_for_result await self.event.wait() File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/asyncio/locks.py", line 226, in wait await fut RuntimeError: Task <Task pending name='Task-8' coro=<RequestResponseCycle.run_asgi() running at /root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py:428> cb=[set.discard()]> got Future attached to a different loop

发起多个请求返回报错

请问你是用什么方式发起的多个请求?多线程吗

@xinyinan9527
Copy link

ERROR: Exception in ASGI application Traceback (most recent call last): File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py", line 428, in run_asgi result = await app( # type: ignore[func-returns-value] File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in call return await self.app(scope, receive, send) File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/fastapi/applications.py", line 282, in call await super().call(scope, receive, send) File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/starlette/applications.py", line 122, in call await self.middleware_stack(scope, receive, send) File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/starlette/middleware/errors.py", line 184, in call raise exc File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/starlette/middleware/errors.py", line 162, in call await self.app(scope, receive, _send) File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 79, in call raise exc File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 68, in call await self.app(scope, receive, sender) File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 20, in call raise e File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 17, in call await self.app(scope, receive, send) File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/starlette/routing.py", line 718, in call await route.handle(scope, receive, send) File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/starlette/routing.py", line 276, in handle await self.app(scope, receive, send) File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/starlette/routing.py", line 66, in app response = await func(request) File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/fastapi/routing.py", line 241, in app raw_response = await run_endpoint_function( File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/fastapi/routing.py", line 167, in run_endpoint_function return await dependant.call(**values) File "/root/chatglm/ChatGLM-6B/ptuning/api_batch.py", line 97, in handle_data return await data_processor.process_data(prompt) File "/root/chatglm/ChatGLM-6B/ptuning/api_batch.py", line 80, in process_data await self.wait_for_result(data) File "/root/chatglm/ChatGLM-6B/ptuning/api_batch.py", line 66, in wait_for_result await self.event.wait() File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/asyncio/locks.py", line 226, in wait await fut RuntimeError: Task <Task pending name='Task-8' coro=<RequestResponseCycle.run_asgi() running at /root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py:428> cb=[set.discard()]> got Future attached to a different loop
发起多个请求返回报错

请问你是用什么方式发起的多个请求?多线程吗

大佬您好,我也遇到同样的问题,接口成功开启之后,使用curl同时发送两个请求就会报错

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants