Skip to content

Commit

Permalink
bump version to v0.5.1 (#2022)
Browse files Browse the repository at this point in the history
* bump version to v0.5.1

* update readme

* update supported models

* update readme

* update supported models

* change to v0.5.1
  • Loading branch information
lvhan028 authored Jul 16, 2024
1 parent aeda1ac commit 9cdce39
Show file tree
Hide file tree
Showing 9 changed files with 71 additions and 59 deletions.
5 changes: 4 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ ______________________________________________________________________
<details open>
<summary><b>2024</b></summary>

- \[2024/07\] Support [InternVL2](https://huggingface.co/collections/OpenGVLab/internvl-20-667d3961ab5eb12c7ed1463e) full-serie models, [InternLM-XComposer2.5](docs/en/multi_modal/xcomposer2d5.md) and [function call](docs/en/serving/api_server_tools.md) of InternLM2.5
- \[2024/06\] PyTorch engine support DeepSeek-V2 and several VLMs, such as CogVLM2, Mini-InternVL, LlaVA-Next
- \[2024/05\] Balance vision model when deploying VLMs with multiple GPUs
- \[2024/05\] Support 4-bits weight-only quantization and inference on VLMs, such as InternVL v1.5, LLaVa, InternLMXComposer2
Expand Down Expand Up @@ -138,9 +139,11 @@ For detailed inference benchmarks in more devices and more settings, please refe
<ul>
<li>LLaVA(1.5,1.6) (7B-34B)</li>
<li>InternLM-XComposer2 (7B, 4khd-7B)</li>
<li>InternLM-XComposer2.5 (7B)</li>
<li>QWen-VL (7B)</li>
<li>DeepSeek-VL (7B)</li>
<li>InternVL-Chat (v1.1-v1.5)</li>
<li>InternVL2 (1B-40B)</li>
<li>MiniGeminiLlama (7B)</li>
<li>CogVLM-Chat (17B)</li>
<li>CogVLM2-Chat (19B)</li>
Expand Down Expand Up @@ -170,7 +173,7 @@ pip install lmdeploy
Since v0.3.0, The default prebuilt package is compiled on **CUDA 12**. However, if CUDA 11+ is required, you can install lmdeploy by:

```shell
export LMDEPLOY_VERSION=0.5.0
export LMDEPLOY_VERSION=0.5.1
export PYTHON_VERSION=38
pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
```
Expand Down
5 changes: 4 additions & 1 deletion README_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ ______________________________________________________________________
<details open>
<summary><b>2024</b></summary>

- \[2024/07\] 支持 [InternVL2](https://huggingface.co/collections/OpenGVLab/internvl-20-667d3961ab5eb12c7ed1463e) 全系列模型,[InternLM-XComposer2.5](docs/zh_cn/multi_modal/xcomposer2d5.md) 模型和 InternLM2.5 的 [function call 功能](docs/zh_cn/serving/api_server_tools.md)
- \[2024/06\] PyTorch engine 支持了 DeepSeek-V2 和若干 VLM 模型推理, 比如 CogVLM2,Mini-InternVL,LlaVA-Next
- \[2024/05\] 在多 GPU 上部署 VLM 模型时,支持把视觉部分的模型均分到多卡上
- \[2024/05\] 支持InternVL v1.5, LLaVa, InternLMXComposer2 等 VLMs 模型的 4bit 权重量化和推理
Expand Down Expand Up @@ -139,9 +140,11 @@ LMDeploy TurboMind 引擎拥有卓越的推理能力,在各种规模的模型
<ul>
<li>LLaVA(1.5,1.6) (7B-34B)</li>
<li>InternLM-XComposer2 (7B, 4khd-7B)</li>
<li>InternLM-XComposer2.5 (7B)</li>
<li>QWen-VL (7B)</li>
<li>DeepSeek-VL (7B)</li>
<li>InternVL-Chat (v1.1-v1.5)</li>
<li>InternVL2 (1B-40B)</li>
<li>MiniGeminiLlama (7B)</li>
<li>CogVLM-Chat (17B)</li>
<li>CogVLM2-Chat (19B)</li>
Expand Down Expand Up @@ -171,7 +174,7 @@ pip install lmdeploy
自 v0.3.0 起,LMDeploy 预编译包默认基于 CUDA 12 编译。如果需要在 CUDA 11+ 下安装 LMDeploy,请执行以下命令:

```shell
export LMDEPLOY_VERSION=0.5.0
export LMDEPLOY_VERSION=0.5.1
export PYTHON_VERSION=38
pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
```
Expand Down
2 changes: 1 addition & 1 deletion docs/en/get_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ pip install lmdeploy
The default prebuilt package is compiled on **CUDA 12**. However, if CUDA 11+ is required, you can install lmdeploy by:

```shell
export LMDEPLOY_VERSION=0.5.0
export LMDEPLOY_VERSION=0.5.1
export PYTHON_VERSION=38
pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
```
Expand Down
2 changes: 1 addition & 1 deletion docs/en/multi_modal/cogvlm.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ Install LMDeploy with pip (Python 3.8+). Refer to [Installation](https://lmdeplo
```shell
# cuda 11.8
# to get the latest version, run: pip index versions lmdeploy
export LMDEPLOY_VERSION=0.5.0
export LMDEPLOY_VERSION=0.5.1
export PYTHON_VERSION=38
pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
# cuda 12.1
Expand Down
55 changes: 29 additions & 26 deletions docs/en/supported_models/supported_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,32 +2,34 @@

## Models supported by TurboMind

| Model | Size | FP16/BF16 | KV INT8 | KV INT4 | W4A16 |
| :-----------------: | :----------: | :-------: | :-----: | :-----: | :---: |
| Llama | 7B - 65B | Yes | Yes | Yes | Yes |
| Llama2 | 7B - 70B | Yes | Yes | Yes | Yes |
| Llama3 | 8B, 70B | Yes | Yes | Yes | Yes |
| InternLM | 7B - 20B | Yes | Yes | Yes | Yes |
| InternLM2 | 7B - 20B | Yes | Yes | Yes | Yes |
| InternLM2.5 | 7B | Yes | Yes | Yes | Yes |
| InternLM-XComposer | 7B | Yes | Yes | Yes | Yes |
| InternLM-XComposer2 | 7B, 4khd-7B | Yes | Yes | Yes | Yes |
| QWen | 1.8B - 72B | Yes | Yes | Yes | Yes |
| QWen1.5 | 1.8B - 110B | Yes | Yes | Yes | Yes |
| QWen2 | 1.5B - 72B | Yes | Yes | Yes | Yes |
| Mistral | 7B | Yes | Yes | Yes | No |
| QWen-VL | 7B | Yes | Yes | Yes | Yes |
| DeepSeek-VL | 7B | Yes | Yes | Yes | Yes |
| Baichuan | 7B | Yes | Yes | Yes | Yes |
| Baichuan2 | 7B | Yes | Yes | Yes | Yes |
| Code Llama | 7B - 34B | Yes | Yes | Yes | No |
| YI | 6B - 34B | Yes | Yes | Yes | No |
| LLaVA(1.5,1.6) | 7B - 34B | Yes | Yes | Yes | Yes |
| InternVL-Chat | v1.1- v1.5 | Yes | Yes | Yes | Yes |
| MiniCPM | Llama3-V-2_5 | Yes | Yes | Yes | Yes |
| MiniGeminiLlama | 7B | Yes | No | No | Yes |
| GLM4 | 9B | Yes | Yes | Yes | No |
| CodeGeeX4 | 9B | Yes | Yes | Yes | No |
| Model | Size | FP16/BF16 | KV INT8 | KV INT4 | W4A16 |
| :-------------------: | :----------: | :-------: | :-----: | :-----: | :---: |
| Llama | 7B - 65B | Yes | Yes | Yes | Yes |
| Llama2 | 7B - 70B | Yes | Yes | Yes | Yes |
| Llama3 | 8B, 70B | Yes | Yes | Yes | Yes |
| InternLM | 7B - 20B | Yes | Yes | Yes | Yes |
| InternLM2 | 7B - 20B | Yes | Yes | Yes | Yes |
| InternLM2.5 | 7B | Yes | Yes | Yes | Yes |
| InternLM-XComposer | 7B | Yes | Yes | Yes | Yes |
| InternLM-XComposer2 | 7B, 4khd-7B | Yes | Yes | Yes | Yes |
| InternLM-XComposer2.5 | 7B | Yes | Yes | Yes | Yes |
| QWen | 1.8B - 72B | Yes | Yes | Yes | Yes |
| QWen1.5 | 1.8B - 110B | Yes | Yes | Yes | Yes |
| QWen2 | 1.5B - 72B | Yes | Yes | Yes | Yes |
| Mistral | 7B | Yes | Yes | Yes | No |
| QWen-VL | 7B | Yes | Yes | Yes | Yes |
| DeepSeek-VL | 7B | Yes | Yes | Yes | Yes |
| Baichuan | 7B | Yes | Yes | Yes | Yes |
| Baichuan2 | 7B | Yes | Yes | Yes | Yes |
| Code Llama | 7B - 34B | Yes | Yes | Yes | No |
| YI | 6B - 34B | Yes | Yes | Yes | No |
| LLaVA(1.5,1.6) | 7B - 34B | Yes | Yes | Yes | Yes |
| InternVL-Chat | v1.1- v1.5 | Yes | Yes | Yes | Yes |
| InternVL2 | 2B-40B | Yes | Yes | Yes | Yes |
| MiniCPM | Llama3-V-2_5 | Yes | Yes | Yes | Yes |
| MiniGeminiLlama | 7B | Yes | No | No | Yes |
| GLM4 | 9B | Yes | Yes | Yes | No |
| CodeGeeX4 | 9B | Yes | Yes | Yes | No |

"-" means not verified yet.

Expand Down Expand Up @@ -66,6 +68,7 @@ The TurboMind engine doesn't support window attention. Therefore, for models tha
| CogVLM2-Chat | 19B | Yes | No | No |
| LLaVA(1.5,1.6) | 7B-34B | Yes | No | No |
| InternVL-Chat(v1.5) | 2B-26B | Yes | No | No |
| InternVL2 | 1B-40B | Yes | No | No |
| Gemma2 | 9B-27B | Yes | No | No |
| GLM4 | 9B | Yes | No | No |
| CodeGeeX4 | 9B | Yes | No | No |
2 changes: 1 addition & 1 deletion docs/zh_cn/get_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ pip install lmdeploy
LMDeploy的预编译包默认是基于 CUDA 12 编译的。如果需要在 CUDA 11+ 下安装 LMDeploy,请执行以下命令:

```shell
export LMDEPLOY_VERSION=0.5.0
export LMDEPLOY_VERSION=0.5.1
export PYTHON_VERSION=38
pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
```
Expand Down
2 changes: 1 addition & 1 deletion docs/zh_cn/multi_modal/cogvlm.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ pip install torch==2.2.2 torchvision==0.17.2 xformers==0.0.26 --index-url https:

```shell
# cuda 11.8
export LMDEPLOY_VERSION=0.5.0
export LMDEPLOY_VERSION=0.5.1
export PYTHON_VERSION=38
pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
# cuda 12.1
Expand Down
55 changes: 29 additions & 26 deletions docs/zh_cn/supported_models/supported_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,32 +2,34 @@

## TurboMind 支持的模型

| 模型 | 模型规模 | FP16/BF16 | KV INT8 | KV INT4 | W4A16 |
| :-----------------: | :----------: | :-------: | :-----: | :-----: | :---: |
| Llama | 7B - 65B | Yes | Yes | Yes | Yes |
| Llama2 | 7B - 70B | Yes | Yes | Yes | Yes |
| Llama3 | 8B, 70B | Yes | Yes | Yes | Yes |
| InternLM | 7B - 20B | Yes | Yes | Yes | Yes |
| InternLM2 | 7B - 20B | Yes | Yes | Yes | Yes |
| InternLM2.5 | 7B | Yes | Yes | Yes | Yes |
| InternLM-XComposer | 7B | Yes | Yes | Yes | Yes |
| InternLM-XComposer2 | 7B, 4khd-7B | Yes | Yes | Yes | Yes |
| QWen | 1.8B - 72B | Yes | Yes | Yes | Yes |
| QWen1.5 | 1.8B - 110B | Yes | Yes | Yes | Yes |
| QWen2 | 1.5B - 72B | Yes | Yes | Yes | Yes |
| Mistral | 7B | Yes | Yes | Yes | No |
| QWen-VL | 7B | Yes | Yes | Yes | Yes |
| DeepSeek-VL | 7B | Yes | Yes | Yes | Yes |
| Baichuan | 7B | Yes | Yes | Yes | Yes |
| Baichuan2 | 7B | Yes | Yes | Yes | Yes |
| Code Llama | 7B - 34B | Yes | Yes | Yes | No |
| YI | 6B - 34B | Yes | Yes | Yes | No |
| LLaVA(1.5,1.6) | 7B - 34B | Yes | Yes | Yes | Yes |
| InternVL-Chat | v1.1- v1.5 | Yes | Yes | Yes | Yes |
| MiniCPM | Llama3-V-2_5 | Yes | Yes | Yes | Yes |
| MiniGeminiLlama | 7B | Yes | No | No | Yes |
| GLM4 | 9B | Yes | Yes | Yes | No |
| CodeGeeX4 | 9B | Yes | Yes | Yes | No |
| 模型 | 模型规模 | FP16/BF16 | KV INT8 | KV INT4 | W4A16 |
| :-------------------: | :----------: | :-------: | :-----: | :-----: | :---: |
| Llama | 7B - 65B | Yes | Yes | Yes | Yes |
| Llama2 | 7B - 70B | Yes | Yes | Yes | Yes |
| Llama3 | 8B, 70B | Yes | Yes | Yes | Yes |
| InternLM | 7B - 20B | Yes | Yes | Yes | Yes |
| InternLM2 | 7B - 20B | Yes | Yes | Yes | Yes |
| InternLM2.5 | 7B | Yes | Yes | Yes | Yes |
| InternLM-XComposer | 7B | Yes | Yes | Yes | Yes |
| InternLM-XComposer2 | 7B, 4khd-7B | Yes | Yes | Yes | Yes |
| InternLM-XComposer2.5 | 7B | Yes | Yes | Yes | Yes |
| QWen | 1.8B - 72B | Yes | Yes | Yes | Yes |
| QWen1.5 | 1.8B - 110B | Yes | Yes | Yes | Yes |
| QWen2 | 1.5B - 72B | Yes | Yes | Yes | Yes |
| Mistral | 7B | Yes | Yes | Yes | No |
| QWen-VL | 7B | Yes | Yes | Yes | Yes |
| DeepSeek-VL | 7B | Yes | Yes | Yes | Yes |
| Baichuan | 7B | Yes | Yes | Yes | Yes |
| Baichuan2 | 7B | Yes | Yes | Yes | Yes |
| Code Llama | 7B - 34B | Yes | Yes | Yes | No |
| YI | 6B - 34B | Yes | Yes | Yes | No |
| LLaVA(1.5,1.6) | 7B - 34B | Yes | Yes | Yes | Yes |
| InternVL-Chat | v1.1- v1.5 | Yes | Yes | Yes | Yes |
| InternVL2 | 2B-40B | Yes | Yes | Yes | Yes |
| MiniCPM | Llama3-V-2_5 | Yes | Yes | Yes | Yes |
| MiniGeminiLlama | 7B | Yes | No | No | Yes |
| GLM4 | 9B | Yes | Yes | Yes | No |
| CodeGeeX4 | 9B | Yes | Yes | Yes | No |

“-” 表示还没有验证。

Expand Down Expand Up @@ -66,6 +68,7 @@ turbomind 引擎不支持 window attention。所以,对于应用了 window att
| CogVLM2-Chat | 19B | Yes | No | No |
| LLaVA(1.5,1.6) | 7B-34B | Yes | No | No |
| InternVL-Chat(v1.5) | 2B-26B | Yes | No | No |
| InternVL2 | 1B-40B | Yes | No | No |
| Gemma2 | 9B-27B | Yes | No | No |
| GLM4 | 9B | Yes | No | No |
| CodeGeeX4 | 9B | Yes | No | No |
2 changes: 1 addition & 1 deletion lmdeploy/version.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Copyright (c) OpenMMLab. All rights reserved.
from typing import Tuple

__version__ = '0.5.0'
__version__ = '0.5.1'
short_version = __version__


Expand Down

0 comments on commit 9cdce39

Please sign in to comment.