Skip to content

Commit

Permalink
update supported models (#2849)
Browse files Browse the repository at this point in the history
* update supported models

* update deepseek-v2.5

* update
  • Loading branch information
lvhan028 authored Dec 6, 2024
1 parent 866bfa5 commit 4f7e50b
Show file tree
Hide file tree
Showing 5 changed files with 36 additions and 13 deletions.
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,8 @@ For detailed inference benchmarks in more devices and more settings, please refe
<li>Qwen1.5 (0.5B - 110B)</li>
<li>Qwen1.5 - MoE (0.5B - 72B)</li>
<li>Qwen2 (0.5B - 72B)</li>
<li>Qwen2-MoE (57BA14B)</li>
<li>Qwen2.5 (0.5B - 32B)</li>
<li>Baichuan (7B)</li>
<li>Baichuan2 (7B-13B)</li>
<li>Code Llama (7B - 34B)</li>
Expand All @@ -136,6 +138,7 @@ For detailed inference benchmarks in more devices and more settings, please refe
<li>Mistral (7B)</li>
<li>DeepSeek-MoE (16B)</li>
<li>DeepSeek-V2 (16B, 236B)</li>
<li>DeepSeek-V2.5 (236B)</li>
<li>Mixtral (8x7B, 8x22B)</li>
<li>Gemma (2B - 7B)</li>
<li>Dbrx (132B)</li>
Expand Down
3 changes: 3 additions & 0 deletions README_ja.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,8 @@ LMDeploy TurboMindエンジンは卓越した推論能力を持ち、さまざ
<li>Qwen1.5 (0.5B - 110B)</li>
<li>Qwen1.5 - MoE (0.5B - 72B)</li>
<li>Qwen2 (0.5B - 72B)</li>
<li>Qwen2-MoE (57BA14B)</li>
<li>Qwen2.5 (0.5B - 32B)</li>
<li>Baichuan (7B)</li>
<li>Baichuan2 (7B-13B)</li>
<li>Code Llama (7B - 34B)</li>
Expand All @@ -133,6 +135,7 @@ LMDeploy TurboMindエンジンは卓越した推論能力を持ち、さまざ
<li>Mistral (7B)</li>
<li>DeepSeek-MoE (16B)</li>
<li>DeepSeek-V2 (16B, 236B)</li>
<li>DeepSeek-V2.5 (236B)</li>
<li>Mixtral (8x7B, 8x22B)</li>
<li>Gemma (2B - 7B)</li>
<li>Dbrx (132B)</li>
Expand Down
3 changes: 3 additions & 0 deletions README_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,8 @@ LMDeploy TurboMind 引擎拥有卓越的推理能力,在各种规模的模型
<li>Qwen1.5 (0.5B - 110B)</li>
<li>Qwen1.5 - MoE (0.5B - 72B)</li>
<li>Qwen2 (0.5B - 72B)</li>
<li>Qwen2-MoE (57BA14B)</li>
<li>Qwen2.5 (0.5B - 32B)</li>
<li>Baichuan (7B)</li>
<li>Baichuan2 (7B-13B)</li>
<li>Code Llama (7B - 34B)</li>
Expand All @@ -137,6 +139,7 @@ LMDeploy TurboMind 引擎拥有卓越的推理能力,在各种规模的模型
<li>Mistral (7B)</li>
<li>DeepSeek-MoE (16B)</li>
<li>DeepSeek-V2 (16B, 236B)</li>
<li>DeepSeek-V2.5 (236B)</li>
<li>Mixtral (8x7B, 8x22B)</li>
<li>Gemma (2B - 7B)</li>
<li>Dbrx (132B)</li>
Expand Down
19 changes: 13 additions & 6 deletions docs/en/supported_models/supported_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,17 +10,21 @@ The following tables detail the models supported by LMDeploy's TurboMind engine
| Llama2 | 7B - 70B | LLM | Yes | Yes | Yes | Yes |
| Llama3 | 8B, 70B | LLM | Yes | Yes | Yes | Yes |
| Llama3.1 | 8B, 70B | LLM | Yes | Yes | Yes | Yes |
| Llama3.2 | 1B, 3B | LLM | Yes | Yes | Yes | Yes |
| Llama3.2 | 1B, 3B | LLM | Yes | Yes\* | Yes\* | Yes |
| InternLM | 7B - 20B | LLM | Yes | Yes | Yes | Yes |
| InternLM2 | 7B - 20B | LLM | Yes | Yes | Yes | Yes |
| InternLM2.5 | 7B | LLM | Yes | Yes | Yes | Yes |
| InternLM-XComposer2 | 7B, 4khd-7B | MLLM | Yes | Yes | Yes | Yes |
| InternLM-XComposer2.5 | 7B | MLLM | Yes | Yes | Yes | Yes |
| Qwen | 1.8B - 72B | LLM | Yes | Yes | Yes | Yes |
| Qwen1.5 | 1.8B - 110B | LLM | Yes | Yes | Yes | Yes |
| Qwen2 | 0.5B - 72B | LLM | Yes | Yes | Yes | Yes |
| Qwen2 | 0.5B - 72B | LLM | Yes | Yes\* | Yes\* | Yes |
| Qwen2-MoE | 57BA14B | LLM | Yes | Yes | Yes | Yes |
| Qwen2.5 | 0.5B - 72B | LLM | Yes | Yes | Yes | Yes |
| Mistral | 7B | LLM | Yes | Yes | Yes | No |
| Mixtral | 8x7B, 8x22B | LLM | Yes | Yes | Yes | Yes |
| DeepSeek-V2 | 16B, 236B | LLM | Yes | Yes | Yes | No |
| DeepSeek-V2.5 | 236B | LLM | Yes | Yes | Yes | No |
| Qwen-VL | 7B | MLLM | Yes | Yes | Yes | Yes |
| DeepSeek-VL | 7B | MLLM | Yes | Yes | Yes | Yes |
| Baichuan | 7B | LLM | Yes | Yes | Yes | Yes |
Expand All @@ -29,7 +33,7 @@ The following tables detail the models supported by LMDeploy's TurboMind engine
| YI | 6B - 34B | LLM | Yes | Yes | Yes | Yes |
| LLaVA(1.5,1.6) | 7B - 34B | MLLM | Yes | Yes | Yes | Yes |
| InternVL | v1.1 - v1.5 | MLLM | Yes | Yes | Yes | Yes |
| InternVL2 | 1-2B, 8B - 76B | MLLM | Yes | Yes | Yes | Yes |
| InternVL2 | 1-2B, 8B - 76B | MLLM | Yes | Yes\* | Yes\* | Yes |
| ChemVLM | 8B - 26B | MLLM | Yes | Yes | Yes | Yes |
| MiniCPM-Llama3-V-2_5 | - | MLLM | Yes | Yes | Yes | Yes |
| MiniCPM-V-2_6 | - | MLLM | Yes | Yes | Yes | Yes |
Expand All @@ -41,7 +45,8 @@ The following tables detail the models supported by LMDeploy's TurboMind engine
"-" means not verified yet.

```{note}
The TurboMind engine doesn't support window attention. Therefore, for models that have applied window attention and have the corresponding switch "use_sliding_window" enabled, such as Mistral, Qwen1.5 and etc., please choose the PyTorch engine for inference.
* The TurboMind engine doesn't support window attention. Therefore, for models that have applied window attention and have the corresponding switch "use_sliding_window" enabled, such as Mistral, Qwen1.5 and etc., please choose the PyTorch engine for inference.
* When the head_dim of a model is not 128, such as llama3.2-1B, qwen2-0.5B and internvl2-1B, turbomind doesn't support its kv cache 4/8 bit quantization and inference
```

## PyTorchEngine on CUDA Platform
Expand All @@ -68,11 +73,13 @@ The TurboMind engine doesn't support window attention. Therefore, for models tha
| QWen1.5 | 0.5B - 110B | LLM | Yes | Yes | Yes | Yes | Yes |
| QWen1.5-MoE | A2.7B | LLM | Yes | Yes | Yes | No | No |
| QWen2 | 0.5B - 72B | LLM | Yes | Yes | No | Yes | Yes |
| Qwen2.5 | 0.5B - 72B | LLM | Yes | Yes | No | Yes | Yes |
| QWen2-VL | 2B, 7B | MLLM | Yes | Yes | No | No | No |
| DeepSeek-MoE | 16B | LLM | Yes | No | No | No | No |
| DeepSeek-V2 | 16B, 236B | LLM | Yes | No | No | No | No |
| DeepSeek-V2.5 | 236B | LLM | Yes | No | No | No | No |
| MiniCPM3 | 4B | LLM | Yes | Yes | Yes | No | No |
| MiniCPM-V-2_6 | 8B | LLM | Yes | No | No | Yes | Yes |
| MiniCPM-V-2_6 | 8B | LLM | Yes | No | No | No | Yes |
| Gemma | 2B-7B | LLM | Yes | Yes | Yes | No | No |
| Dbrx | 132B | LLM | Yes | Yes | Yes | No | No |
| StarCoder2 | 3B-15B | LLM | Yes | Yes | Yes | No | No |
Expand All @@ -81,7 +88,7 @@ The TurboMind engine doesn't support window attention. Therefore, for models tha
| CogVLM-Chat | 17B | MLLM | Yes | Yes | Yes | - | - |
| CogVLM2-Chat | 19B | MLLM | Yes | Yes | Yes | - | - |
| LLaVA(1.5,1.6) | 7B-34B | MLLM | Yes | Yes | Yes | - | - |
| InternVL(v1.5) | 2B-26B | MLLM | Yes | Yes | Yes | Yes | Yes |
| InternVL(v1.5) | 2B-26B | MLLM | Yes | Yes | Yes | No | Yes |
| InternVL2 | 1B-40B | MLLM | Yes | Yes | Yes | - | - |
| Mono-InternVL | 2B | MLLM | Yes\* | Yes | Yes | - | - |
| ChemVLM | 8B-26B | MLLM | Yes | Yes | No | - | - |
Expand Down
21 changes: 14 additions & 7 deletions docs/zh_cn/supported_models/supported_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,17 +10,21 @@
| Llama2 | 7B - 70B | LLM | Yes | Yes | Yes | Yes |
| Llama3 | 8B, 70B | LLM | Yes | Yes | Yes | Yes |
| Llama3.1 | 8B, 70B | LLM | Yes | Yes | Yes | Yes |
| Llama3.2 | 1B, 3B | LLM | Yes | Yes | Yes | Yes |
| Llama3.2 | 1B, 3B | LLM | Yes | Yes\* | Yes\* | Yes |
| InternLM | 7B - 20B | LLM | Yes | Yes | Yes | Yes |
| InternLM2 | 7B - 20B | LLM | Yes | Yes | Yes | Yes |
| InternLM2.5 | 7B | LLM | Yes | Yes | Yes | Yes |
| InternLM-XComposer2 | 7B, 4khd-7B | MLLM | Yes | Yes | Yes | Yes |
| InternLM-XComposer2.5 | 7B | MLLM | Yes | Yes | Yes | Yes |
| Qwen | 1.8B - 72B | LLM | Yes | Yes | Yes | Yes |
| Qwen1.5 | 1.8B - 110B | LLM | Yes | Yes | Yes | Yes |
| Qwen2 | 0.5B - 72B | LLM | Yes | Yes | Yes | Yes |
| Qwen2 | 0.5B - 72B | LLM | Yes | Yes\* | Yes\* | Yes |
| Qwen2-MoE | 57BA14B | LLM | Yes | Yes | Yes | Yes |
| Qwen2.5 | 0.5B - 72B | LLM | Yes | Yes | Yes | Yes |
| Mistral | 7B | LLM | Yes | Yes | Yes | No |
| Mixtral | 8x7B, 8x22B | LLM | Yes | Yes | Yes | Yes |
| DeepSeek-V2 | 16B, 236B | LLM | Yes | Yes | Yes | No |
| DeepSeek-V2.5 | 236B | LLM | Yes | Yes | Yes | No |
| Qwen-VL | 7B | MLLM | Yes | Yes | Yes | Yes |
| DeepSeek-VL | 7B | MLLM | Yes | Yes | Yes | Yes |
| Baichuan | 7B | LLM | Yes | Yes | Yes | Yes |
Expand All @@ -29,7 +33,7 @@
| YI | 6B - 34B | LLM | Yes | Yes | Yes | Yes |
| LLaVA(1.5,1.6) | 7B - 34B | MLLM | Yes | Yes | Yes | Yes |
| InternVL | v1.1 - v1.5 | MLLM | Yes | Yes | Yes | Yes |
| InternVL2 | 1-2B, 8B - 76B | MLLM | Yes | Yes | Yes | Yes |
| InternVL2 | 1-2B, 8B - 76B | MLLM | Yes | Yes\* | Yes\* | Yes |
| ChemVLM | 8B - 26B | MLLM | Yes | Yes | Yes | Yes |
| MiniCPM-Llama3-V-2_5 | - | MLLM | Yes | Yes | Yes | Yes |
| MiniCPM-V-2_6 | - | MLLM | Yes | Yes | Yes | Yes |
Expand All @@ -41,7 +45,8 @@
“-” 表示还没有验证。

```{note}
turbomind 引擎不支持 window attention。所以,对于应用了 window attention,并开启了对应的开关"use_sliding_window"的模型,比如 Mistral、Qwen1.5 等,在推理时,请选择 pytorch engine
* turbomind 引擎不支持 window attention。所以,对于应用了 window attention,并开启了对应的开关"use_sliding_window"的模型,比如 Mistral、Qwen1.5 等,在推理时,请选择 pytorch engine
* 当模型的 head_dim 非 128 时,turbomind 不支持它的 kv cache 4/8 bit 量化和推理。比如,llama3.2-1B,qwen2-0.5B,internvl2-1B 等等
```

## PyTorchEngine CUDA 平台
Expand All @@ -68,11 +73,13 @@ turbomind 引擎不支持 window attention。所以,对于应用了 window att
| QWen1.5 | 0.5B - 110B | LLM | Yes | Yes | Yes | Yes | Yes |
| QWen1.5-MoE | A2.7B | LLM | Yes | Yes | Yes | No | No |
| QWen2 | 0.5B - 72B | LLM | Yes | Yes | No | Yes | Yes |
| Qwen2.5 | 0.5B - 72B | LLM | Yes | Yes | No | Yes | Yes |
| QWen2-VL | 2B, 7B | MLLM | Yes | Yes | No | No | No |
| DeepSeek-MoE | 16B | LLM | Yes | No | No | No | No |
| DeepSeek-V2 | 16B, 236B | LLM | Yes | No | No | No | No |
| DeepSeek-V2.5 | 236B | LLM | Yes | No | No | No | No |
| MiniCPM3 | 4B | LLM | Yes | Yes | Yes | No | No |
| MiniCPM-V-2_6 | 8B | LLM | Yes | No | No | Yes | Yes |
| MiniCPM-V-2_6 | 8B | LLM | Yes | No | No | No | Yes |
| Gemma | 2B-7B | LLM | Yes | Yes | Yes | No | No |
| Dbrx | 132B | LLM | Yes | Yes | Yes | No | No |
| StarCoder2 | 3B-15B | LLM | Yes | Yes | Yes | No | No |
Expand All @@ -81,7 +88,7 @@ turbomind 引擎不支持 window attention。所以,对于应用了 window att
| CogVLM-Chat | 17B | MLLM | Yes | Yes | Yes | - | - |
| CogVLM2-Chat | 19B | MLLM | Yes | Yes | Yes | - | - |
| LLaVA(1.5,1.6) | 7B-34B | MLLM | Yes | Yes | Yes | - | - |
| InternVL(v1.5) | 2B-26B | MLLM | Yes | Yes | Yes | Yes | Yes |
| InternVL(v1.5) | 2B-26B | MLLM | Yes | Yes | Yes | No | Yes |
| InternVL2 | 1B-40B | MLLM | Yes | Yes | Yes | - | - |
| Mono-InternVL | 2B | MLLM | Yes\* | Yes | Yes | - | - |
| ChemVLM | 8B-26B | MLLM | Yes | Yes | No | - | - |
Expand All @@ -94,7 +101,7 @@ turbomind 引擎不支持 window attention。所以,对于应用了 window att
| Phi-3.5-vision | 4.2B | MLLM | Yes | Yes | No | - | - |

```{note}
* Currently Mono-InternVL does not support FP16 due to numerical instability. Please use BF16 instead.
* 目前,Mono-InternVL不支持FP16,因为数值不稳定。请改用BF16。
```

## PyTorchEngine 华为昇腾平台
Expand Down

0 comments on commit 4f7e50b

Please sign in to comment.