Enable the Gradio server to call inference services through the RESTf…

…ul API (#287) * app use async engine * add stop logic * app update cancel * app support restful-api * update doc and use the right model name * set doc url root * add comments * add an example * renew_session * update readme.md * resolve comments * Update restful_api.md * Update restful_api.md * Update restful_api.md --------- Co-authored-by: tpoisonooo <[email protected]>
InternLM · Aug 24, 2023 · 4279d8c · 4279d8c
1 parent 81f2983
commit 4279d8c
Show file tree

Hide file tree

Showing 10 changed files with 515 additions and 170 deletions.
diff --git a/README.md b/README.md
@@ -133,6 +133,32 @@ python3 -m lmdeploy.serve.gradio.app ./workspace
 
 ![](https://github.com/InternLM/lmdeploy/assets/67539920/08d1e6f2-3767-44d5-8654-c85767cec2ab)
 
+#### Serving with Restful API
+
+Launch inference server by:
+
+```shell
+python3 -m lmdeploy.serve.openai.api_server ./workspace server_ip server_port --instance_num 32 --tp 1
+```
+
+Then, you can communicate with it by command line,
+
+```shell
+# restful_api_url is what printed in api_server.py, e.g. http://localhost:23333
+python -m lmdeploy.serve.openai.api_client restful_api_url
+```
+
+or webui,
+
+```shell
+# restful_api_url is what printed in api_server.py, e.g. http://localhost:23333
+# server_ip and server_port here are for gradio ui
+# example: python -m lmdeploy.serve.gradio.app http://localhost:23333 localhost 6006 --restful_api True
+python -m lmdeploy.serve.gradio.app restful_api_url server_ip --restful_api True
+```
+
+Refer to [restful_api.md](docs/en/restful_api.md) for more details.
+
 #### Serving with Triton Inference Server
 
 Launch inference server by:

diff --git a/README_zh-CN.md b/README_zh-CN.md
@@ -133,6 +133,32 @@ python3 -m lmdeploy.serve.gradio.app ./workspace
 
 ![](https://github.com/InternLM/lmdeploy/assets/67539920/08d1e6f2-3767-44d5-8654-c85767cec2ab)
 
+#### 通过 Restful API 部署服务
+
+使用下面的命令启动推理服务：
+
+```shell
+python3 -m lmdeploy.serve.openai.api_server ./workspace server_ip server_port --instance_num 32 --tp 1
+```
+
+你可以通过命令行方式与推理服务进行对话：
+
+```shell
+# restful_api_url is what printed in api_server.py, e.g. http://localhost:23333
+python -m lmdeploy.serve.openai.api_client restful_api_url
+```
+
+也可以通过 WebUI 方式来对话：
+
+```shell
+# restful_api_url is what printed in api_server.py, e.g. http://localhost:23333
+# server_ip and server_port here are for gradio ui
+# example: python -m lmdeploy.serve.gradio.app http://localhost:23333 localhost 6006 --restful_api True
+python -m lmdeploy.serve.gradio.app restful_api_url server_ip --restful_api True
+```
+
+更多详情可以查阅 [restful_api.md](docs/zh_cn/restful_api.md)。
+
 #### 通过容器部署推理服务
 
 使用下面的命令启动推理服务：

diff --git a/docs/en/restful_api.md b/docs/en/restful_api.md
@@ -3,10 +3,10 @@
 ### Launch Service
 
 ```shell
-python3 -m lmdeploy.serve.openai.api_server ./workspace server_name server_port --instance_num 32 --tp 1
+python3 -m lmdeploy.serve.openai.api_server ./workspace 0.0.0.0 server_port --instance_num 32 --tp 1
 ```
 
-Then, the user can open the swagger UI: http://{server_name}:{server_port}/docs for the detailed api usage.
+Then, the user can open the swagger UI: `http://{server_ip}:{server_port}` for the detailed api usage.
 We provide four restful api in total. Three of them are in OpenAI format. However, we recommend users try
 our own api which provides more arguments for users to modify. The performance is comparatively better.
 
@@ -50,16 +50,29 @@ def get_streaming_response(prompt: str,
 
 
 for output, tokens in get_streaming_response(
-        "Hi, how are you?", "http://{server_name}:{server_port}/generate", 0,
+        "Hi, how are you?", "http://{server_ip}:{server_port}/generate", 0,
         512):
     print(output, end='')
 ```
 
-### Golang/Rust
+### Java/Golang/Rust
 
-Golang can also build a http request to use the service. You may refer
-to [the blog](https://pkg.go.dev/net/http) for details to build own client.
-Besides, Rust supports building a client in [many ways](https://blog.logrocket.com/best-rust-http-client/).
+May use [openapi-generator-cli](https://github.com/OpenAPITools/openapi-generator-cli) to convert `http://{server_ip}:{server_port}/openapi.json` to java/rust/golang client.
+Here is an example:
+
+```shell
+$ docker run -it --rm -v ${PWD}:/local openapitools/openapi-generator-cli generate -i /local/openapi.json -g rust -o /local/rust
+
+$ ls rust/*
+rust/Cargo.toml  rust/git_push.sh  rust/README.md
+
+rust/docs:
+ChatCompletionRequest.md  EmbeddingsRequest.md  HttpValidationError.md  LocationInner.md  Prompt.md
+DefaultApi.md             GenerateRequest.md    Input.md                Messages.md       ValidationError.md
+
+rust/src:
+apis  lib.rs  models
+```
 
 ### cURL
 
@@ -68,13 +81,13 @@ cURL is a tool for observing the output of the api.
 List Models:
 
 ```bash
-curl http://{server_name}:{server_port}/v1/models
+curl http://{server_ip}:{server_port}/v1/models
 ```
 
 Generate:
 
 ```bash
-curl http://{server_name}:{server_port}/generate \
+curl http://{server_ip}:{server_port}/generate \
   -H "Content-Type: application/json" \
   -d '{
     "model": "internlm-chat-7b",
@@ -87,7 +100,7 @@ curl http://{server_name}:{server_port}/generate \
 Chat Completions:
 
 ```bash
-curl http://{server_name}:{server_port}/v1/chat/completions \
+curl http://{server_ip}:{server_port}/v1/chat/completions \
   -H "Content-Type: application/json" \
   -d '{
     "model": "internlm-chat-7b",
@@ -98,14 +111,34 @@ curl http://{server_name}:{server_port}/v1/chat/completions \
 Embeddings:
 
 ```bash
-curl http://{server_name}:{server_port}/v1/embeddings \
+curl http://{server_ip}:{server_port}/v1/embeddings \
   -H "Content-Type: application/json" \
   -d '{
     "model": "internlm-chat-7b",
     "input": "Hello world!"
   }'
 ```
 
+### CLI client
+
+There is a client script for restful api server.
+
+```shell
+# restful_api_url is what printed in api_server.py, e.g. http://localhost:23333
+python -m lmdeploy.serve.openai.api_client restful_api_url
+```
+
+### webui
+
+You can also test restful-api through webui.
+
+```shell
+# restful_api_url is what printed in api_server.py, e.g. http://localhost:23333
+# server_ip and server_port here are for gradio ui
+# example: python -m lmdeploy.serve.gradio.app http://localhost:23333 localhost 6006 --restful_api True
+python -m lmdeploy.serve.gradio.app restful_api_url server_ip --restful_api True
+```
+
 ### FAQ
 
 1. When user got `"finish_reason":"length"` which means the session is too long to be continued.

diff --git a/docs/zh_cn/restful_api.md b/docs/zh_cn/restful_api.md
@@ -5,10 +5,10 @@
 运行脚本
 
 ```shell
-python3 -m lmdeploy.serve.openai.api_server ./workspace server_name server_port --instance_num 32 --tp 1
+python3 -m lmdeploy.serve.openai.api_server ./workspace 0.0.0.0 server_port --instance_num 32 --tp 1
 ```
 
-然后用户可以打开 swagger UI: http://{server_name}:{server_port}/docs 详细查看所有的 API 及其使用方法。
+然后用户可以打开 swagger UI: `http://{server_ip}:{server_port}` 详细查看所有的 API 及其使用方法。
 我们一共提供四个 restful api，其中三个仿照 OpenAI 的形式。不过，我们建议用户用我们提供的另一个 API: `generate`。
 它有更好的性能，提供更多的参数让用户自定义修改。
 
@@ -52,15 +52,29 @@ def get_streaming_response(prompt: str,
 
 
 for output, tokens in get_streaming_response(
-        "Hi, how are you?", "http://{server_name}:{server_port}/generate", 0,
+        "Hi, how are you?", "http://{server_ip}:{server_port}/generate", 0,
         512):
     print(output, end='')
 ```
 
-### Golang/Rust
+### Java/Golang/Rust
 
-Golang 也可以建立 http 请求使用启动的服务，用户可以参考[这篇博客](https://pkg.go.dev/net/http)构建自己的客户端。
-Rust 也有许多[方法](https://blog.logrocket.com/best-rust-http-client/)构建客户端，使用服务。
+可以使用代码生成工具 [openapi-generator-cli](https://github.com/OpenAPITools/openapi-generator-cli) 将 `http://{server_ip}:{server_port}/openapi.json` 转成 java/rust/golang 客户端。
+下面是一个使用示例：
+
+```shell
+$ docker run -it --rm -v ${PWD}:/local openapitools/openapi-generator-cli generate -i /local/openapi.json -g rust -o /local/rust
+
+$ ls rust/*
+rust/Cargo.toml  rust/git_push.sh  rust/README.md
+
+rust/docs:
+ChatCompletionRequest.md  EmbeddingsRequest.md  HttpValidationError.md  LocationInner.md  Prompt.md
+DefaultApi.md             GenerateRequest.md    Input.md                Messages.md       ValidationError.md
+
+rust/src:
+apis  lib.rs  models
+```
 
 ### cURL
 
@@ -69,13 +83,13 @@ cURL 也可以用于查看 API 的输出结果
 查看模型列表：
 
 ```bash
-curl http://{server_name}:{server_port}/v1/models
+curl http://{server_ip}:{server_port}/v1/models
 ```
 
 使用 generate:
 
 ```bash
-curl http://{server_name}:{server_port}/generate \
+curl http://{server_ip}:{server_port}/generate \
   -H "Content-Type: application/json" \
   -d '{
     "model": "internlm-chat-7b",
@@ -88,7 +102,7 @@ curl http://{server_name}:{server_port}/generate \
 Chat Completions:
 
 ```bash
-curl http://{server_name}:{server_port}/v1/chat/completions \
+curl http://{server_ip}:{server_port}/v1/chat/completions \
   -H "Content-Type: application/json" \
   -d '{
     "model": "internlm-chat-7b",
@@ -99,14 +113,34 @@ curl http://{server_name}:{server_port}/v1/chat/completions \
 Embeddings:
 
 ```bash
-curl http://{server_name}:{server_port}/v1/embeddings \
+curl http://{server_ip}:{server_port}/v1/embeddings \
   -H "Content-Type: application/json" \
   -d '{
     "model": "internlm-chat-7b",
     "input": "Hello world!"
   }'
 ```
 
+### CLI client
+
+restful api 服务可以通过客户端测试，例如
+
+```shell
+# restful_api_url 就是 api_server 产生的，比如 http://localhost:23333
+python -m lmdeploy.serve.openai.api_client restful_api_url
+```
+
+### webui
+
+也可以直接用 webui 测试使用 restful-api。
+
+```shell
+# restful_api_url 就是 api_server 产生的，比如 http://localhost:23333
+# server_ip 和 server_port 是用来提供 gradio ui 访问服务的
+# 例子: python -m lmdeploy.serve.gradio.app http://localhost:23333 localhost 6006 --restful_api True
+python -m lmdeploy.serve.gradio.app restful_api_url server_ip --restful_api True
+```
+
 ### FAQ
 
 1. 当返回结果结束原因为 `"finish_reason":"length"`，这表示回话长度超过最大值。