Skip to content

Commit

Permalink
Support Mono-InternVL with PyTorch backend (#2727)
Browse files Browse the repository at this point in the history
* support Mono-InternVL; fix typos

* update readme

* add assertion for FP16

* add assertion for FP16

* update _SUPPORTED_ARCHS
  • Loading branch information
wzk1015 authored Nov 11, 2024
1 parent 78ab485 commit 06aea5d
Show file tree
Hide file tree
Showing 33 changed files with 458 additions and 54 deletions.
14 changes: 7 additions & 7 deletions .github/CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
## Contributing to InternLM
## Contributing to LMDeploy

Welcome to the InternLM community, all kinds of contributions are welcomed, including but not limited to
Welcome to the LMDeploy community, all kinds of contributions are welcomed, including but not limited to

**Fix bug**

Expand Down Expand Up @@ -56,7 +56,7 @@ upstream [email protected]:InternLM/lmdeploy.git (push)
#### 2. Configure pre-commit

You should configure [pre-commit](https://pre-commit.com/#intro) in the local development environment to make sure the code style matches that of InternLM. **Note**: The following code should be executed under the lmdeploy directory.
You should configure [pre-commit](https://pre-commit.com/#intro) in the local development environment to make sure the code style matches that of LMDeploy. **Note**: The following code should be executed under the lmdeploy directory.

```shell
pip install -U pre-commit
Expand Down Expand Up @@ -96,7 +96,7 @@ git checkout -b yhc/refactor_contributing_doc
In subsequent development, if the master branch of the local repository is behind the master branch of "upstream", we need to pull the upstream for synchronization, and then execute the above command:

```shell
git pull upstream master
git pull upstream main
```

#### 4. Commit the code and pass the unit test
Expand Down Expand Up @@ -151,7 +151,7 @@ Find more details about Pull Request description in [pull request guidelines](#p

<img src="https://user-images.githubusercontent.com/57566630/167307490-f9ebf9fa-63c0-4d83-8ba1-081ea169eb3a.png" width="1200">

IternLM will run unit test for the posted Pull Request on different platforms (Linux, Window, Mac), based on different versions of Python, PyTorch, CUDA to make sure the code is correct. We can see the specific test information by clicking `Details` in the above image so that we can modify the code.
LMDeploy will run unit test for the posted Pull Request on different platforms (Linux, Window, Mac), based on different versions of Python, PyTorch, CUDA to make sure the code is correct. We can see the specific test information by clicking `Details` in the above image so that we can modify the code.

(3) If the Pull Request passes the CI, then you can wait for the review from other developers. You'll modify the code based on the reviewer's comments, and repeat the steps [4](#4-commit-the-code-and-pass-the-unit-test)-[5](#5-push-the-code-to-remote) until all reviewers approve it. Then, we will merge it ASAP.

Expand All @@ -163,14 +163,14 @@ If your local branch conflicts with the latest master branch of "upstream", you'

```shell
git fetch --all --prune
git rebase upstream/master
git rebase upstream/main
```

or

```shell
git fetch --all --prune
git merge upstream/master
git merge upstream/main
```

If you are very good at handling conflicts, then you can use rebase to resolve conflicts, as this will keep your commit logs tidy. If you are not familiar with `rebase`, then you can use `merge` to resolve conflicts.
Expand Down
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ ______________________________________________________________________
<details open>
<summary><b>2024</b></summary>

- \[2024/11\] Support Mono-InternVL with PyTorch engine
- \[2024/10\] PyTorchEngine supports graph mode on ascend platform, doubling the inference speed
- \[2024/09\] LMDeploy PyTorchEngine adds support for [Huawei Ascend](./docs/en/get_started/ascend/get_started.md). See supported models [here](docs/en/supported_models/supported_models.md)
- \[2024/09\] LMDeploy PyTorchEngine achieves 1.3x faster on Llama3-8B inference by introducing CUDA graph
Expand Down Expand Up @@ -155,6 +156,7 @@ For detailed inference benchmarks in more devices and more settings, please refe
<li>DeepSeek-VL (7B)</li>
<li>InternVL-Chat (v1.1-v1.5)</li>
<li>InternVL2 (1B-76B)</li>
<li>Mono-InternVL (2B)</li>
<li>MiniGeminiLlama (7B)</li>
<li>CogVLM-Chat (17B)</li>
<li>CogVLM2-Chat (19B)</li>
Expand Down
2 changes: 2 additions & 0 deletions README_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ ______________________________________________________________________
<details open>
<summary><b>2024</b></summary>

- \[2024/11\] PyTorch engine 支持 Mono-InternVL 模型
- \[2024/10\] PyTorchEngine 在 ascend 平台上支持了图模式,推理性能提高了 1 倍
- \[2024/09\] LMDeploy PyTorchEngine 增加了对 [华为 Ascend](docs/zh_cn/get_started/ascend/get_started.md) 的支持。支持的模型请见[这里](docs/zh_cn/supported_models/supported_models.md)
- \[2024/09\] 通过引入 CUDA Graph,LMDeploy PyTorchEngine 在 Llama3-8B 推理上实现了 1.3 倍的加速
Expand Down Expand Up @@ -156,6 +157,7 @@ LMDeploy TurboMind 引擎拥有卓越的推理能力,在各种规模的模型
<li>DeepSeek-VL (7B)</li>
<li>InternVL-Chat (v1.1-v1.5)</li>
<li>InternVL2 (1B-76B)</li>
<li>Mono-InternVL (2B)</li>
<li>MiniGeminiLlama (7B)</li>
<li>CogVLM-Chat (17B)</li>
<li>CogVLM2-Chat (19B)</li>
Expand Down
13 changes: 7 additions & 6 deletions docs/en/multi_modal/internvl.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,13 @@

LMDeploy supports the following InternVL series of models, which are detailed in the table below:

| Model | Size | Supported Inference Engine |
| :---------: | :--------: | :------------------------: |
| InternVL | 13B-19B | TurboMind |
| InternVL1.5 | 2B-26B | TurboMind, PyTorch |
| InternVL2 | 1B, 4B | PyTorch |
| InternVL2 | 2B, 8B-76B | TurboMind, PyTorch |
| Model | Size | Supported Inference Engine |
| :-----------: | :--------: | :------------------------: |
| InternVL | 13B-19B | TurboMind |
| InternVL1.5 | 2B-26B | TurboMind, PyTorch |
| InternVL2 | 1B, 4B | PyTorch |
| InternVL2 | 2B, 8B-76B | TurboMind, PyTorch |
| Mono-InternVL | 2B | PyTorch |

The next chapter demonstrates how to deploy an InternVL model using LMDeploy, with [InternVL2-8B](https://huggingface.co/OpenGVLab/InternVL2-8B) as an example.

Expand Down
1 change: 1 addition & 0 deletions docs/en/multi_modal/vl_pipeline.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ Currently, it supports the following models.
- [Yi-VL](https://huggingface.co/01-ai/Yi-VL-6B)
- [DeepSeek-VL](https://huggingface.co/deepseek-ai/deepseek-vl-7b-chat)
- [InternVL](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5)
- [Mono-InternVL](https://huggingface.co/OpenGVLab/Mono-InternVL-2B)
- [MGM](https://huggingface.co/YanweiLi/MGM-7B)
- [XComposer](https://huggingface.co/internlm/internlm-xcomposer2-vl-7b)
- [CogVLM](https://github.com/InternLM/lmdeploy/tree/main/docs/en/multi_modal/cogvlm.md)
Expand Down
5 changes: 5 additions & 0 deletions docs/en/supported_models/supported_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,7 @@ The TurboMind engine doesn't support window attention. Therefore, for models tha
| LLaVA(1.5,1.6) | 7B-34B | MLLM | Yes | Yes | Yes | No | - |
| InternVL(v1.5) | 2B-26B | MLLM | Yes | Yes | Yes | No | Yes |
| InternVL2 | 1B-40B | MLLM | Yes | Yes | Yes | No | - |
| Mono-InternVL | 2B | MLLM | Yes\* | Yes | Yes | No | - |
| Gemma2 | 9B-27B | LLM | Yes | Yes | Yes | No | - |
| GLM4 | 9B | LLM | Yes | Yes | Yes | No | No |
| GLM-4V | 9B | MLLM | Yes | Yes | Yes | No | No |
Expand All @@ -88,6 +89,10 @@ The TurboMind engine doesn't support window attention. Therefore, for models tha
| Phi-3.5-MoE | 16x3.8B | LLM | Yes | Yes | No | No | - |
| Phi-3.5-vision | 4.2B | MLLM | Yes | Yes | No | No | - |

```{note}
* Currently Mono-InternVL does not support FP16 due to numerical instability. Please use BF16 instead.
```

## PyTorchEngine on Huawei Ascend Platform

| Model | Size | Type | FP16/BF16 | W4A16 |
Expand Down
17 changes: 9 additions & 8 deletions docs/zh_cn/multi_modal/internvl.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,15 @@

LMDeploy 支持 InternVL 系列模型,具体如下:

| Model | Size | Supported Inference Engine |
| :---------: | :--------: | :------------------------: |
| InternVL | 13B-19B | TurboMind |
| InternVL1.5 | 2B-26B | TurboMind, PyTorch |
| InternVL2 | 1B, 4B | PyTorch |
| InternVL2 | 2B, 8B-76B | TurboMind, PyTorch |

本文将以[InternVL2-8B](https://huggingface.co/OpenGVLab/InternVL2-8B)为例,演示使用 LMDeploy 部署 InternVL 系列模型的方法
| Model | Size | Supported Inference Engine |
| :-----------: | :--------: | :------------------------: |
| InternVL | 13B-19B | TurboMind |
| InternVL1.5 | 2B-26B | TurboMind, PyTorch |
| InternVL2 | 1B, 4B | PyTorch |
| InternVL2 | 2B, 8B-76B | TurboMind, PyTorch |
| Mono-InternVL | 2B | PyTorch |

本文将以[InternVL2-8B](https://huggingface.co/OpenGVLab/InternVL2-8B)为例,演示使用 LMDeploy 部署 InternVL 系列模型的方法。

## 安装

Expand Down
1 change: 1 addition & 0 deletions docs/zh_cn/multi_modal/vl_pipeline.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ LMDeploy 把视觉-语言模型(VLM)复杂的推理过程,抽象为简单
- [Yi-VL](https://huggingface.co/01-ai/Yi-VL-6B)
- [DeepSeek-VL](https://huggingface.co/deepseek-ai/deepseek-vl-7b-chat)
- [InternVL](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5)
- [Mono-InternVL](https://huggingface.co/OpenGVLab/Mono-InternVL-2B)
- [MGM](https://huggingface.co/YanweiLi/MGM-7B)
- [XComposer](https://huggingface.co/internlm/internlm-xcomposer2-vl-7b)
- [CogVLM](https://github.com/InternLM/lmdeploy/tree/main/docs/zh_cn/multi_modal/cogvlm.md)
Expand Down
5 changes: 5 additions & 0 deletions docs/zh_cn/supported_models/supported_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,7 @@ turbomind 引擎不支持 window attention。所以,对于应用了 window att
| LLaVA(1.5,1.6) | 7B-34B | MLLM | Yes | Yes | Yes | No | - |
| InternVL(v1.5) | 2B-26B | MLLM | Yes | Yes | Yes | No | Yes |
| InternVL2 | 1B-40B | MLLM | Yes | Yes | Yes | No | - |
| Mono-InternVL | 2B | MLLM | Yes\* | Yes | Yes | No | - |
| Gemma2 | 9B-27B | LLM | Yes | Yes | Yes | No | - |
| GLM4 | 9B | LLM | Yes | Yes | Yes | No | No |
| GLM-4V | 9B | MLLM | Yes | Yes | Yes | No | No |
Expand All @@ -88,6 +89,10 @@ turbomind 引擎不支持 window attention。所以,对于应用了 window att
| Phi-3.5-MoE | 16x3.8B | LLM | Yes | Yes | No | No | - |
| Phi-3.5-vision | 4.2B | MLLM | Yes | Yes | No | No | - |

```{note}
* Currently Mono-InternVL does not support FP16 due to numerical instability. Please use BF16 instead.
```

## PyTorchEngine 华为昇腾平台

| Model | Size | Type | FP16/BF16 | W4A16 |
Expand Down
3 changes: 2 additions & 1 deletion lmdeploy/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -578,7 +578,8 @@ def match(cls, model_path: str) -> Optional[str]:
model_path (str): the model path used for matching.
"""
path = model_path.lower()
if 'internvl2' in path and 'internvl2-4b' not in path:
if ('internvl2' in path
and 'internvl2-4b' not in path) or 'mono-internvl' in path:
return 'internvl2-internlm2'


Expand Down
2 changes: 1 addition & 1 deletion lmdeploy/pytorch/models/baichuan.py
Original file line number Diff line number Diff line change
Expand Up @@ -167,7 +167,7 @@ def __init__(self,
# build attention layer
self.self_attn = BaichuanAttention(config, dtype=dtype, device=device)

# builf MLP
# build MLP
self.mlp = MLP(config, dtype=dtype, device=device)

# build input layer norm
Expand Down
2 changes: 1 addition & 1 deletion lmdeploy/pytorch/models/chatglm2.py
Original file line number Diff line number Diff line change
Expand Up @@ -279,7 +279,7 @@ def __init__(self,
# build attention layer
self.self_attention = SelfAttention(config, dtype=dtype, device=device)

# builf MLP
# build MLP
self.mlp = MLP(config, dtype=dtype, device=device)

# build input layer norm
Expand Down
2 changes: 1 addition & 1 deletion lmdeploy/pytorch/models/cogvlm.py
Original file line number Diff line number Diff line change
Expand Up @@ -263,7 +263,7 @@ def __init__(self,
dtype=dtype,
device=device)

# builf MLP
# build MLP
self.mlp = VisionExpertMLP(config, dtype=dtype, device=device)

# build input layer norm
Expand Down
2 changes: 1 addition & 1 deletion lmdeploy/pytorch/models/dbrx.py
Original file line number Diff line number Diff line change
Expand Up @@ -301,7 +301,7 @@ def __init__(self,
dtype=dtype,
device=device)

# builf MLP
# build MLP
self.ffn = DbrxFFN(config, dtype=dtype, device=device)

def forward(
Expand Down
2 changes: 1 addition & 1 deletion lmdeploy/pytorch/models/deepseek.py
Original file line number Diff line number Diff line change
Expand Up @@ -250,7 +250,7 @@ def __init__(self,
# build attention layer
self.self_attn = DeepseekAttention(config, dtype=dtype, device=device)

# builf MLP
# build MLP
self.mlp = (DeepseekMoE(config, dtype=dtype, device=device) if
(config.n_routed_experts is not None
and layer_idx >= config.first_k_dense_replace
Expand Down
2 changes: 1 addition & 1 deletion lmdeploy/pytorch/models/falcon.py
Original file line number Diff line number Diff line change
Expand Up @@ -179,7 +179,7 @@ def __init__(self,
dtype=dtype,
device=device)

# builf MLP
# build MLP
self.mlp = FalconMLP(config, dtype=dtype, device=device)

if not hasattr(config, 'num_ln_in_parallel_attn'):
Expand Down
2 changes: 1 addition & 1 deletion lmdeploy/pytorch/models/gemma.py
Original file line number Diff line number Diff line change
Expand Up @@ -177,7 +177,7 @@ def __init__(self,
dtype=dtype,
device=device)

# builf MLP
# build MLP
self.mlp = GemmaMLP(config, dtype=dtype, device=device)

# build input layer norm
Expand Down
2 changes: 1 addition & 1 deletion lmdeploy/pytorch/models/internlm.py
Original file line number Diff line number Diff line change
Expand Up @@ -161,7 +161,7 @@ def __init__(self,
# build attention layer
self.self_attn = InternLMAttention(config, dtype=dtype, device=device)

# builf MLP
# build MLP
self.mlp = InternLMMLP(config, dtype=dtype, device=device)

# build input layer norm
Expand Down
2 changes: 1 addition & 1 deletion lmdeploy/pytorch/models/internlm2.py
Original file line number Diff line number Diff line change
Expand Up @@ -160,7 +160,7 @@ def __init__(self,
# build attention layer
self.attention = InternLM2Attention(config, dtype=dtype, device=device)

# builf MLP
# build MLP
self.feed_forward = InternLM2MLP(config, dtype=dtype, device=device)

# build input layer norm
Expand Down
Loading

0 comments on commit 06aea5d

Please sign in to comment.