Skip to content

Commit

Permalink
add docs
Browse files Browse the repository at this point in the history
  • Loading branch information
irexyc committed Nov 15, 2023
1 parent 47fd6a8 commit 8dd4876
Show file tree
Hide file tree
Showing 4 changed files with 104 additions and 0 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ ______________________________________________________________________

## News 🎉

- \[2023/11\] Turbomind supports loading hf model directly. Click [here](./docs/en/load_hf.md) for details.
- \[2023/09\] TurboMind supports Qwen-14B
- \[2023/09\] TurboMind supports InternLM-20B
- \[2023/09\] TurboMind supports all features of Code Llama: code completion, infilling, chat / instruct, and python specialist. Click [here](./docs/en/supported_models/codellama.md) for deployment guide
Expand Down
1 change: 1 addition & 0 deletions README_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ ______________________________________________________________________

## 更新 🎉

- \[2023/11\] Turbomind 支持直接读取 Huggingface 模型。点击[这里](./docs/en/load_hf.md)查看使用方法
- \[2023/09\] TurboMind 支持 Qwen-14B
- \[2023/09\] TurboMind 支持 InternLM-20B 模型
- \[2023/09\] TurboMind 支持 Code Llama 所有功能:代码续写、填空、对话、Python专项。点击[这里](./docs/zh_cn/supported_models/codellama.md)阅读部署方法
Expand Down
51 changes: 51 additions & 0 deletions docs/en/load_hf.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Load huggingface model directly

Before v0.0.14, if you want to serving or inference by TurboMind, you should first convert the model to TurboMind format. Through offline conversion, the model can be loaded faster, but it isn't user-friendly. Therefore, LMDeploy adds the ability of online conversion and support loading huggingface model directly.

## Supported model type

Currently, Turbomind support loading three types of model:

1. A model converted by `lmdeploy convert`, old format
2. A quantized model managed by [lmdeploy](https://huggingface.co/lmdeploy) or [internlm](https://huggingface.co/lmdeploy) on huggingface.co
3. Other hot LM models on huggingface.co like Qwen/Qwen-7B-Chat

### Usage

#### 1) A model converted by `lmdeploy convert`

The usage is like previous

```
# Inference by TurboMind
lmdeploy chat turbomind ./workspace
# Serving with gradio
lmdeploy serve gradio ./workspace
# Serving with Restful API
lmdeploy serve api_server ./workspace --instance_num 32 --tp 1
```

#### 2) A quantized model managed by lmdeploy / internlm

For quantized models managed by lmdeploy or internlm, the parameters required for online conversion are already exist in config.json, so you only need to pass the repo_id or local path when using it.

```
repo_id=lmdeploy/qwen-chat-7b-4bit
# or
# repo_id=/path/to/managed_model
# Inference by TurboMind
lmdeploy chat turbomind $repo_id
# Serving with gradio
lmdeploy serve gradio $repo_id
# Serving with Restful API
lmdeploy serve api_server $repo_id --instance_num 32 --tp 1
```

#### 3) Other hot LM models

For other popular models such as Qwen/Qwen-7B-Chat or baichuan-inc/Baichuan2-7B-Chat, the name of the model needs to be passed in. LMDeploy supported models can be viewed through `lmdeploy list`.
51 changes: 51 additions & 0 deletions docs/zh_cn/load_hf.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# 直接读取 huggingface 模型

在 V0.0.14 版本之前,若想使用 LMDeploy 进行推理或者部署,需要先使用命令 `lmdeploy convert` 将模型离线转换为 TurboMind 推理引擎支持的格式,转换后的模型可以更快地进行加载,但对用户使用来说并不友好,因此,LDMdeploy 决定增加在线转换的功能,支持直接读取 Huggingface 的模型。

## 支持的类型

目前,TurboMind 支持加载三种类型的模型:

1. 通过 `lmdeploy convert` 命令转换好的模型,兼容旧格式
2. huggingface.co 上面 [lmdeploy](https://huggingface.co/lmdeploy) / [internlm](https://huggingface.co/lmdeploy) 所管理的模型
3. huggingface.co 上面其他 LM 模型,如Qwen/Qwen-7B-Chat。

### 使用方式

#### 1) 通过 `lmdeploy convert` 命令转换好的模型

使用方式与之前相同

```
# Inference by TurboMind
lmdeploy chat turbomind ./workspace
# Serving with gradio
lmdeploy serve gradio ./workspace
# Serving with Restful API
lmdeploy serve api_server ./workspace --instance_num 32 --tp 1
```

#### 2) lmdeploy / internlm 所管理的量化模型

lmdeploy / internlm 所管理的模型,config.json 中已经有在线转换需要的参数,所以使用时只需要传入 repo_id 或者本地路径即可。

```
repo_id=lmdeploy/qwen-chat-7b-4bit
# or
# repo_id=/path/to/managed_model
# Inference by TurboMind
lmdeploy chat turbomind $repo_id
# Serving with gradio
lmdeploy serve gradio $repo_id
# Serving with Restful API
lmdeploy serve api_server $repo_id --instance_num 32 --tp 1
```

#### 3) 其他的 LM 模型

其他的比较热门的模型比如 Qwen/Qwen-7B-Chat, baichuan-inc/Baichuan2-7B-Chat,需要传入模型的名字。LMDeploy 模型支持情况可通过 `lmdeploy list` 查看。

0 comments on commit 8dd4876

Please sign in to comment.