-
Notifications
You must be signed in to change notification settings - Fork 464
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add request distributor server #903
Changes from 7 commits
5fb45c8
731c026
09a7073
8e99a02
a3486cb
f9e53c8
c07631a
a772030
da40f37
49f6b0c
099ee25
bc074ea
a8b8573
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -77,3 +77,4 @@ work_dir*/ | |
*.pkl | ||
|
||
!CMakeLists.txt | ||
proxy_config.yml |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
## Proxy | ||
|
||
The proxy service can parallelize multiple api_server services. Users only need to access the proxy URL, and they can indirectly access different api_server services. The proxy service will automatically distribute requests internally, achieving load balancing. | ||
|
||
### Startup | ||
|
||
Start the proxy service: | ||
|
||
```shell | ||
python lmdeploy/serve/proxy/proxy.py --server_name {server_name} --server_port {server_port} --strategy "min_expected_latency" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 能用下面这种方式么? python3 -m lmdeploy.serve.proxy --server-name {server_name} --server-port |
||
``` | ||
|
||
After startup is successful, the URL of the proxy service will also be printed by the script. Access this URL in your browser to open the Swagger UI. | ||
|
||
### API | ||
|
||
Through Swagger UI, we can see multiple APIs. Those related to api_server node management include: | ||
|
||
- /nodes/status | ||
- /nodes/add | ||
- /nodes/remove | ||
|
||
They respectively represent viewing all api_server service nodes, adding a certain node, and deleting a certain node. | ||
|
||
APIs related to usage include: | ||
|
||
- /v1/models | ||
- /v1/chat/completions | ||
- /v1/completions | ||
|
||
The usage of these APIs is the same as that of api_server. | ||
|
||
### Dispatch Strategy | ||
|
||
The current distribution strategies of the proxy service are as follows: | ||
|
||
- random: dispatches based on the ability of each api_server node provided by the user to process requests. The greater the request throughput, the more likely it is to be allocated. Nodes that do not provide throughput are treated according to the average throughput of other nodes. | ||
- min_expected_latency: allocates based on the number of requests currently waiting to be processed on each node, and the throughput capability of each node, calculating the expected time required to complete the response. The shortest one gets allocated. Nodes that do not provide throughput are treated similarly. | ||
- min_observed_latency: allocates based on the average time required to handle a certain number of past requests on each node. The one with the shortest time gets allocated. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -160,3 +160,7 @@ lmdeploy serve gradio api_server_url --server_name ${gradio_ui_ip} --server_port | |
4. The `/v1/chat/interactive` api disables engaging in multiple rounds of conversation by default. The input argument `prompt` consists of either single strings or entire chat histories. | ||
|
||
5. If you need to adjust other default parameters of the session, such as the content of fields like system. You can directly pass in the initialization parameters of the [dialogue template](https://github.com/InternLM/lmdeploy/blob/main/lmdeploy/model.py). For example, for the internlm-chat-7b model, you can set the `--meta_instruction` parameter when starting the `api_server`. | ||
|
||
### multiple services | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. request distribution service |
||
|
||
Please refer to our [proxy service](./proxy_server.md) |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -47,6 +47,7 @@ | |
:maxdepth: 1 | ||
:caption: 服务 | ||
|
||
serving/proxy_server.md | ||
serving/restful_api.md | ||
|
||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
## 代理 | ||
|
||
代理服务可以将多个 api_server 服务,进行并联。用户可以只需要访问代理 URL,就可以间接访问不同的 api_server 服务。代理服务内部会自动分发请求,做到负载均衡。 | ||
|
||
### 启动 | ||
|
||
启动代理服务: | ||
|
||
```shell | ||
python lmdeploy/serve/proxy/proxy.py --server_name {server_name} --server_port {server_port} --strategy "min_expected_latency" | ||
``` | ||
|
||
启动成功后,代理服务的 URL 也会被脚本打印。浏览器访问这个 URL,可以打开 Swagger UI。 | ||
|
||
### API | ||
|
||
通过 Swagger UI,我们可以看到多个 API。其中,和 api_server 节点管理相关的有: | ||
|
||
- /nodes/status | ||
- /nodes/add | ||
- /nodes/remove | ||
|
||
他们分别表示,查看所有的 api_server 服务节点,增加某个节点,删除某个节点。 | ||
|
||
和使用相关的 api 有: | ||
|
||
- /v1/models | ||
- /v1/chat/completions | ||
- /v1/completions | ||
|
||
这些 API 的使用方式和 api_server 一样。 | ||
|
||
### 分发策略 | ||
|
||
代理服务目前的分发策略如下: | ||
|
||
- random: 根据用户提供的各个 api_server 节点的处理请求的能力,进行有权重的随机。处理请求的吞吐量越大,就越有可能被分配。部分节点没有提供吞吐量,将按照其他节点的平均吞吐量对待。 | ||
- min_expected_latency: 根据每个节点现有的待处理完的请求,和各个节点吞吐能力,计算预期完成响应所需时间,时间最短的将被分配。未提供吞吐量的节点,同上。 | ||
- min_observed_latency: 根据每个节点过去一定数量的请求,处理完成所需的平均用时,用时最短的将被分配。 |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -154,3 +154,7 @@ lmdeploy serve gradio api_server_url --server_name ${gradio_ui_ip} --server_port | |
4. `/v1/chat/interactive` api 支持多轮对话, 但是默认关闭。`messages` 或者 `prompt` 参数既可以是一个简单字符串表示用户的单词提问,也可以是一段对话历史。 | ||
|
||
5. 如需调整会话默认的其他参数,比如 system 等字段的内容,可以直接将[对话模板](https://github.com/InternLM/lmdeploy/blob/main/lmdeploy/model.py)初始化参数传入。比如 internlm-chat-7b 模型,可以通过启动`api_server`时,设置`--meta_instruction`参数。 | ||
|
||
### 多个服务并行 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 多机并行服务 |
||
|
||
请参考我们的 [代理服务](./proxy_server.md) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
# Copyright (c) OpenMMLab. All rights reserved. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it better to put |
||
|
||
import enum | ||
|
||
LATENCY_DEEQUE_LEN = 15 | ||
API_TIMEOUT_LEN = 100 | ||
|
||
|
||
class Strategy(enum.Enum): | ||
RANDOM = enum.auto() | ||
MIN_EXPECTED_LATENCY = enum.auto() | ||
MIN_OBSERVED_LATENCY = enum.auto() | ||
|
||
@classmethod | ||
def from_str(cls, name): | ||
if name == 'random': | ||
return cls.RANDOM | ||
elif name == 'min_expected_latency': | ||
return cls.MIN_EXPECTED_LATENCY | ||
elif name == 'min_observed_latency': | ||
return cls.MIN_OBSERVED_LATENCY | ||
else: | ||
raise ValueError(f'Invalid strategy: {name}. Supported: random, ' | ||
f'min_expected_latency, min_observed_latency.') | ||
|
||
|
||
class ErrorCodes(enum.Enum): | ||
MODEL_NOT_FOUND = 10400 | ||
SERVICE_UNAVAILABLE = 10401 | ||
API_TIMEOUT = 10402 | ||
|
||
|
||
err_msg = { | ||
ErrorCodes.MODEL_NOT_FOUND: | ||
'The request model name does not exist in the model list.', | ||
ErrorCodes.SERVICE_UNAVAILABLE: | ||
'The service is unavailable now. May retry later.', | ||
ErrorCodes.API_TIMEOUT: 'Failed to get response after a period of time' | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
H1 title: "Request Distributor Server"