-
Notifications
You must be signed in to change notification settings - Fork 456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add request distributor server #903
Conversation
Run command: python3 lmdeploy/serve/proxy/proxy.py --server-port 33338 got error message:
|
miss |
lmdeploy/constants.py
Outdated
@@ -0,0 +1,39 @@ | |||
# Copyright (c) OpenMMLab. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it better to put constants.py
to lmdeploy/serve/proxy
?
Conflicts: docs/en/serving/restful_api.md docs/zh_cn/serving/restful_api.md
docs/en/serving/proxy_server.md
Outdated
@@ -0,0 +1,39 @@ | |||
## Proxy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
H1 title: "Request Distributor Server"
docs/en/serving/restful_api.md
Outdated
@@ -162,3 +162,7 @@ lmdeploy serve gradio api_server_url --server_name ${gradio_ui_ip} --server_port | |||
5. If you need to adjust other default parameters of the session, such as the content of fields like system. You can directly pass in the initialization parameters of the [dialogue template](https://github.com/InternLM/lmdeploy/blob/main/lmdeploy/model.py). For example, for the internlm-chat-7b model, you can set the `--meta_instruction` parameter when starting the `api_server`. | |||
|
|||
6. Regarding the stop words, we only support characters that encode into a single index. Furthermore, there may be multiple indexes that decode into results containing the stop word. In such cases, if the number of these indexes is too large, we will only use the index encoded by the tokenizer. If you want use a stop symbol that encodes into multiple indexes, you may consider performing string matching on the streaming client side. Once a successful match is found, you can then break out of the streaming loop. | |||
|
|||
### multiple services |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
request distribution service
docs/zh_cn/serving/restful_api.md
Outdated
@@ -156,3 +156,7 @@ lmdeploy serve gradio api_server_url --server_name ${gradio_ui_ip} --server_port | |||
5. 如需调整会话默认的其他参数,比如 system 等字段的内容,可以直接将[对话模板](https://github.com/InternLM/lmdeploy/blob/main/lmdeploy/model.py)初始化参数传入。比如 internlm-chat-7b 模型,可以通过启动`api_server`时,设置`--meta_instruction`参数。 | |||
|
|||
6. 关于停止符,我们只支持编码后为单个 index 的字符。此外,可能存在多种 index 都会解码出带有停止符的结果。对于这种情况,如果这些 index 数量太多,我们只会采用 tokenizer 编码出的 index。而如果你想要编码后为多个 index 的停止符,可以考虑在流式客户端做字符串匹配,匹配成功后跳出流式循环即可。 | |||
|
|||
### 多个服务并行 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
多机并行服务
docs/en/serving/proxy_server.md
Outdated
Start the proxy service: | ||
|
||
```shell | ||
python lmdeploy/serve/proxy/proxy.py --server_name {server_name} --server_port {server_port} --strategy "min_expected_latency" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
能用下面这种方式么?
python3 -m lmdeploy.serve.proxy --server-name {server_name} --server-port
No description provided.