add docs

InternLM · Nov 15, 2023 · 8dd4876 · 8dd4876
1 parent 47fd6a8
commit 8dd4876
Show file tree

Hide file tree

Showing 4 changed files with 104 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -20,6 +20,7 @@ ______________________________________________________________________
 
 ## News 🎉
 
+- \[2023/11\] Turbomind supports loading hf model directly. Click [here](./docs/en/load_hf.md) for details.
 - \[2023/09\] TurboMind supports Qwen-14B
 - \[2023/09\] TurboMind supports InternLM-20B
 - \[2023/09\] TurboMind supports all features of Code Llama: code completion, infilling, chat / instruct, and python specialist. Click [here](./docs/en/supported_models/codellama.md) for deployment guide

diff --git a/README_zh-CN.md b/README_zh-CN.md
@@ -20,6 +20,7 @@ ______________________________________________________________________
 
 ## 更新 🎉
 
+- \[2023/11\] Turbomind 支持直接读取 Huggingface 模型。点击[这里](./docs/en/load_hf.md)查看使用方法
 - \[2023/09\] TurboMind 支持 Qwen-14B
 - \[2023/09\] TurboMind 支持 InternLM-20B 模型
 - \[2023/09\] TurboMind 支持 Code Llama 所有功能：代码续写、填空、对话、Python专项。点击[这里](./docs/zh_cn/supported_models/codellama.md)阅读部署方法

diff --git a/docs/en/load_hf.md b/docs/en/load_hf.md
@@ -0,0 +1,51 @@
+# Load huggingface model directly
+
+Before v0.0.14, if you want to serving or inference by TurboMind, you should first convert the model to TurboMind format. Through offline conversion, the model can be loaded faster, but it isn't user-friendly. Therefore, LMDeploy adds the ability of online conversion and support loading huggingface model directly.
+
+## Supported model type
+
+Currently, Turbomind support loading three types of model:
+
+1. A model converted by `lmdeploy convert`, old format
+2. A quantized model managed by [lmdeploy](https://huggingface.co/lmdeploy) or [internlm](https://huggingface.co/lmdeploy) on huggingface.co
+3. Other hot LM models on huggingface.co like Qwen/Qwen-7B-Chat
+
+### Usage
+
+#### 1) A model converted by `lmdeploy convert`
+
+The usage is like previous
+
+```
+# Inference by TurboMind
+lmdeploy chat turbomind ./workspace
+
+# Serving with gradio
+lmdeploy serve gradio ./workspace
+
+# Serving with Restful API
+lmdeploy serve api_server ./workspace --instance_num 32 --tp 1
+```
+
+#### 2) A quantized model managed by lmdeploy / internlm
+
+For quantized models managed by lmdeploy or internlm, the parameters required for online conversion are already exist in config.json, so you only need to pass the repo_id or local path when using it.
+
+```
+repo_id=lmdeploy/qwen-chat-7b-4bit
+# or
+# repo_id=/path/to/managed_model
+
+# Inference by TurboMind
+lmdeploy chat turbomind $repo_id
+
+# Serving with gradio
+lmdeploy serve gradio $repo_id
+
+# Serving with Restful API
+lmdeploy serve api_server $repo_id --instance_num 32 --tp 1
+```
+
+#### 3) Other hot LM models
+
+For other popular models such as Qwen/Qwen-7B-Chat or baichuan-inc/Baichuan2-7B-Chat, the name of the model needs to be passed in. LMDeploy supported models can be viewed through `lmdeploy list`.
diff --git a/docs/zh_cn/load_hf.md b/docs/zh_cn/load_hf.md
@@ -0,0 +1,51 @@
+# 直接读取 huggingface 模型
+
+在 V0.0.14 版本之前，若想使用 LMDeploy 进行推理或者部署，需要先使用命令 `lmdeploy convert` 将模型离线转换为 TurboMind 推理引擎支持的格式，转换后的模型可以更快地进行加载，但对用户使用来说并不友好，因此，LDMdeploy 决定增加在线转换的功能，支持直接读取 Huggingface 的模型。
+
+## 支持的类型
+
+目前，TurboMind 支持加载三种类型的模型：
+
+1. 通过 `lmdeploy convert` 命令转换好的模型，兼容旧格式
+2. huggingface.co 上面 [lmdeploy](https://huggingface.co/lmdeploy) / [internlm](https://huggingface.co/lmdeploy) 所管理的模型
+3. huggingface.co 上面其他 LM 模型，如Qwen/Qwen-7B-Chat。
+
+### 使用方式
+
+#### 1) 通过 `lmdeploy convert` 命令转换好的模型
+
+使用方式与之前相同
+
+```
+# Inference by TurboMind
+lmdeploy chat turbomind ./workspace
+
+# Serving with gradio
+lmdeploy serve gradio ./workspace
+
+# Serving with Restful API
+lmdeploy serve api_server ./workspace --instance_num 32 --tp 1
+```
+
+#### 2) lmdeploy / internlm 所管理的量化模型
+
+lmdeploy / internlm 所管理的模型，config.json 中已经有在线转换需要的参数，所以使用时只需要传入 repo_id 或者本地路径即可。
+
+```
+repo_id=lmdeploy/qwen-chat-7b-4bit
+# or
+# repo_id=/path/to/managed_model
+
+# Inference by TurboMind
+lmdeploy chat turbomind $repo_id
+
+# Serving with gradio
+lmdeploy serve gradio $repo_id
+
+# Serving with Restful API
+lmdeploy serve api_server $repo_id --instance_num 32 --tp 1
+```
+
+#### 3) 其他的 LM 模型
+
+其他的比较热门的模型比如 Qwen/Qwen-7B-Chat, baichuan-inc/Baichuan2-7B-Chat，需要传入模型的名字。LMDeploy 模型支持情况可通过 `lmdeploy list` 查看。