Add request distributor server #903

AllentDan · 2024-01-02T10:13:18Z

No description provided.

lmdeploy/constants.py

lvhan028 · 2024-01-11T09:51:03Z

Run command:

python3 lmdeploy/serve/proxy/proxy.py --server-port 33338

got error message:

ImportError: cannot import name 'Doc' from 'typing_extensions' (/usr/local/lib/python3.8/dist-packages/typing_extensions.py)

lvhan028 · 2024-01-11T09:52:59Z

miss __init__.py in proxy folder

lvhan028 · 2024-01-11T09:55:13Z

lmdeploy/constants.py

@@ -0,0 +1,39 @@
+# Copyright (c) OpenMMLab. All rights reserved.


Is it better to put constants.py to lmdeploy/serve/proxy?

Conflicts: docs/en/serving/restful_api.md docs/zh_cn/serving/restful_api.md

lvhan028 · 2024-01-11T12:17:11Z

docs/en/serving/proxy_server.md

@@ -0,0 +1,39 @@
+## Proxy


H1 title: "Request Distributor Server"

lvhan028 · 2024-01-11T12:17:39Z

docs/en/serving/restful_api.md

@@ -162,3 +162,7 @@ lmdeploy serve gradio api_server_url --server_name ${gradio_ui_ip} --server_port
 5. If you need to adjust other default parameters of the session, such as the content of fields like system. You can directly pass in the initialization parameters of the [dialogue template](https://github.com/InternLM/lmdeploy/blob/main/lmdeploy/model.py). For example, for the internlm-chat-7b model, you can set the `--meta_instruction` parameter when starting the `api_server`.

 6. Regarding the stop words, we only support characters that encode into a single index. Furthermore, there may be multiple indexes that decode into results containing the stop word. In such cases, if the number of these indexes is too large, we will only use the index encoded by the tokenizer. If you want use a stop symbol that encodes into multiple indexes, you may consider performing string matching on the streaming client side. Once a successful match is found, you can then break out of the streaming loop.
+
+### multiple services


request distribution service

lvhan028 · 2024-01-11T12:18:25Z

docs/zh_cn/serving/restful_api.md

@@ -156,3 +156,7 @@ lmdeploy serve gradio api_server_url --server_name ${gradio_ui_ip} --server_port
 5. 如需调整会话默认的其他参数，比如 system 等字段的内容，可以直接将[对话模板](https://github.com/InternLM/lmdeploy/blob/main/lmdeploy/model.py)初始化参数传入。比如 internlm-chat-7b 模型，可以通过启动`api_server`时，设置`--meta_instruction`参数。

 6. 关于停止符，我们只支持编码后为单个 index 的字符。此外，可能存在多种 index 都会解码出带有停止符的结果。对于这种情况，如果这些 index 数量太多，我们只会采用 tokenizer 编码出的 index。而如果你想要编码后为多个 index 的停止符，可以考虑在流式客户端做字符串匹配，匹配成功后跳出流式循环即可。
+
+### 多个服务并行


多机并行服务

lvhan028 · 2024-01-11T12:53:25Z

docs/en/serving/proxy_server.md

+Start the proxy service:
+
+```shell
+python lmdeploy/serve/proxy/proxy.py --server_name {server_name} --server_port {server_port} --strategy "min_expected_latency"


能用下面这种方式么？

python3 -m lmdeploy.serve.proxy --server-name {server_name} --server-port

AllentDan added 2 commits January 2, 2024 17:31

add proxy

5fb45c8

add documents

731c026

lvhan028 added the planned feature label Jan 2, 2024

AllentDan added 3 commits January 4, 2024 17:46

Merge branch 'main' into proxy

09a7073

update documents

8e99a02

move doc

a3486cb

lvhan028 requested review from tpoisonooo and lvhan028 January 4, 2024 10:42

AllentDan changed the title ~~Add proxy~~ Add proxy server Jan 5, 2024

tpoisonooo approved these changes Jan 8, 2024

View reviewed changes