bump version to v0.5.1 (#2022)

* bump version to v0.5.1 * update readme * update supported models * update readme * update supported models * change to v0.5.1
InternLM · Jul 16, 2024 · 9cdce39 · 9cdce39
1 parent aeda1ac
commit 9cdce39
Show file tree

Hide file tree

Showing 9 changed files with 71 additions and 59 deletions.
diff --git a/README.md b/README.md
@@ -26,6 +26,7 @@ ______________________________________________________________________
 <details open>
 <summary><b>2024</b></summary>
 
+- \[2024/07\] Support [InternVL2](https://huggingface.co/collections/OpenGVLab/internvl-20-667d3961ab5eb12c7ed1463e) full-serie models, [InternLM-XComposer2.5](docs/en/multi_modal/xcomposer2d5.md) and [function call](docs/en/serving/api_server_tools.md) of InternLM2.5
 - \[2024/06\] PyTorch engine support DeepSeek-V2 and several VLMs, such as CogVLM2, Mini-InternVL, LlaVA-Next
 - \[2024/05\] Balance vision model when deploying VLMs with multiple GPUs
 - \[2024/05\] Support 4-bits weight-only quantization and inference on VLMs, such as InternVL v1.5, LLaVa, InternLMXComposer2
@@ -138,9 +139,11 @@ For detailed inference benchmarks in more devices and more settings, please refe
 <ul>
   <li>LLaVA(1.5,1.6) (7B-34B)</li>
   <li>InternLM-XComposer2 (7B, 4khd-7B)</li>
+  <li>InternLM-XComposer2.5 (7B)</li>
   <li>QWen-VL (7B)</li>
   <li>DeepSeek-VL (7B)</li>
   <li>InternVL-Chat (v1.1-v1.5)</li>
+  <li>InternVL2 (1B-40B)</li>
   <li>MiniGeminiLlama (7B)</li>
   <li>CogVLM-Chat (17B)</li>
   <li>CogVLM2-Chat (19B)</li>
@@ -170,7 +173,7 @@ pip install lmdeploy
 Since v0.3.0, The default prebuilt package is compiled on **CUDA 12**. However, if CUDA 11+ is required, you can install lmdeploy by:
 
 ```shell
-export LMDEPLOY_VERSION=0.5.0
+export LMDEPLOY_VERSION=0.5.1
 export PYTHON_VERSION=38
 pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
 ```

diff --git a/README_zh-CN.md b/README_zh-CN.md
@@ -26,6 +26,7 @@ ______________________________________________________________________
 <details open>
 <summary><b>2024</b></summary>
 
+- \[2024/07\] 支持 [InternVL2](https://huggingface.co/collections/OpenGVLab/internvl-20-667d3961ab5eb12c7ed1463e) 全系列模型，[InternLM-XComposer2.5](docs/zh_cn/multi_modal/xcomposer2d5.md) 模型和 InternLM2.5 的 [function call 功能](docs/zh_cn/serving/api_server_tools.md)
 - \[2024/06\] PyTorch engine 支持了 DeepSeek-V2 和若干 VLM 模型推理, 比如 CogVLM2，Mini-InternVL，LlaVA-Next
 - \[2024/05\] 在多 GPU 上部署 VLM 模型时，支持把视觉部分的模型均分到多卡上
 - \[2024/05\] 支持InternVL v1.5, LLaVa, InternLMXComposer2 等 VLMs 模型的 4bit 权重量化和推理
@@ -139,9 +140,11 @@ LMDeploy TurboMind 引擎拥有卓越的推理能力，在各种规模的模型
 <ul>
   <li>LLaVA(1.5,1.6) (7B-34B)</li>
   <li>InternLM-XComposer2 (7B, 4khd-7B)</li>
+  <li>InternLM-XComposer2.5 (7B)</li>
   <li>QWen-VL (7B)</li>
   <li>DeepSeek-VL (7B)</li>
   <li>InternVL-Chat (v1.1-v1.5)</li>
+  <li>InternVL2 (1B-40B)</li>
   <li>MiniGeminiLlama (7B)</li>
   <li>CogVLM-Chat (17B)</li>
   <li>CogVLM2-Chat (19B)</li>
@@ -171,7 +174,7 @@ pip install lmdeploy
 自 v0.3.0 起，LMDeploy 预编译包默认基于 CUDA 12 编译。如果需要在 CUDA 11+ 下安装 LMDeploy，请执行以下命令：
 
 ```shell
-export LMDEPLOY_VERSION=0.5.0
+export LMDEPLOY_VERSION=0.5.1
 export PYTHON_VERSION=38
 pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
 ```

diff --git a/docs/en/get_started.md b/docs/en/get_started.md
@@ -13,7 +13,7 @@ pip install lmdeploy
 The default prebuilt package is compiled on **CUDA 12**. However, if CUDA 11+ is required, you can install lmdeploy by:
 
 ```shell
-export LMDEPLOY_VERSION=0.5.0
+export LMDEPLOY_VERSION=0.5.1
 export PYTHON_VERSION=38
 pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
 ```

diff --git a/docs/en/multi_modal/cogvlm.md b/docs/en/multi_modal/cogvlm.md
@@ -22,7 +22,7 @@ Install LMDeploy with pip (Python 3.8+). Refer to [Installation](https://lmdeplo
 ```shell
 # cuda 11.8
 # to get the latest version, run: pip index versions lmdeploy
-export LMDEPLOY_VERSION=0.5.0
+export LMDEPLOY_VERSION=0.5.1
 export PYTHON_VERSION=38
 pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
 # cuda 12.1

diff --git a/docs/en/supported_models/supported_models.md b/docs/en/supported_models/supported_models.md
@@ -2,32 +2,34 @@
 
 ## Models supported by TurboMind
 
-|        Model        |     Size     | FP16/BF16 | KV INT8 | KV INT4 | W4A16 |
-| :-----------------: | :----------: | :-------: | :-----: | :-----: | :---: |
-|        Llama        |   7B - 65B   |    Yes    |   Yes   |   Yes   |  Yes  |
-|       Llama2        |   7B - 70B   |    Yes    |   Yes   |   Yes   |  Yes  |
-|       Llama3        |   8B, 70B    |    Yes    |   Yes   |   Yes   |  Yes  |
-|      InternLM       |   7B - 20B   |    Yes    |   Yes   |   Yes   |  Yes  |
-|      InternLM2      |   7B - 20B   |    Yes    |   Yes   |   Yes   |  Yes  |
-|     InternLM2.5     |      7B      |    Yes    |   Yes   |   Yes   |  Yes  |
-| InternLM-XComposer  |      7B      |    Yes    |   Yes   |   Yes   |  Yes  |
-| InternLM-XComposer2 | 7B, 4khd-7B  |    Yes    |   Yes   |   Yes   |  Yes  |
-|        QWen         |  1.8B - 72B  |    Yes    |   Yes   |   Yes   |  Yes  |
-|       QWen1.5       | 1.8B - 110B  |    Yes    |   Yes   |   Yes   |  Yes  |
-|        QWen2        |  1.5B - 72B  |    Yes    |   Yes   |   Yes   |  Yes  |
-|       Mistral       |      7B      |    Yes    |   Yes   |   Yes   |  No   |
-|       QWen-VL       |      7B      |    Yes    |   Yes   |   Yes   |  Yes  |
-|     DeepSeek-VL     |      7B      |    Yes    |   Yes   |   Yes   |  Yes  |
-|      Baichuan       |      7B      |    Yes    |   Yes   |   Yes   |  Yes  |
-|      Baichuan2      |      7B      |    Yes    |   Yes   |   Yes   |  Yes  |
-|     Code Llama      |   7B - 34B   |    Yes    |   Yes   |   Yes   |  No   |
-|         YI          |   6B - 34B   |    Yes    |   Yes   |   Yes   |  No   |
-|   LLaVA(1.5,1.6)    |   7B - 34B   |    Yes    |   Yes   |   Yes   |  Yes  |
-|    InternVL-Chat    |  v1.1- v1.5  |    Yes    |   Yes   |   Yes   |  Yes  |
-|       MiniCPM       | Llama3-V-2_5 |    Yes    |   Yes   |   Yes   |  Yes  |
-|   MiniGeminiLlama   |      7B      |    Yes    |   No    |   No    |  Yes  |
-|        GLM4         |      9B      |    Yes    |   Yes   |   Yes   |  No   |
-|      CodeGeeX4      |      9B      |    Yes    |   Yes   |   Yes   |  No   |
+|         Model         |     Size     | FP16/BF16 | KV INT8 | KV INT4 | W4A16 |
+| :-------------------: | :----------: | :-------: | :-----: | :-----: | :---: |
+|         Llama         |   7B - 65B   |    Yes    |   Yes   |   Yes   |  Yes  |
+|        Llama2         |   7B - 70B   |    Yes    |   Yes   |   Yes   |  Yes  |
+|        Llama3         |   8B, 70B    |    Yes    |   Yes   |   Yes   |  Yes  |
+|       InternLM        |   7B - 20B   |    Yes    |   Yes   |   Yes   |  Yes  |
+|       InternLM2       |   7B - 20B   |    Yes    |   Yes   |   Yes   |  Yes  |
+|      InternLM2.5      |      7B      |    Yes    |   Yes   |   Yes   |  Yes  |
+|  InternLM-XComposer   |      7B      |    Yes    |   Yes   |   Yes   |  Yes  |
+|  InternLM-XComposer2  | 7B, 4khd-7B  |    Yes    |   Yes   |   Yes   |  Yes  |
+| InternLM-XComposer2.5 |      7B      |    Yes    |   Yes   |   Yes   |  Yes  |
+|         QWen          |  1.8B - 72B  |    Yes    |   Yes   |   Yes   |  Yes  |
+|        QWen1.5        | 1.8B - 110B  |    Yes    |   Yes   |   Yes   |  Yes  |
+|         QWen2         |  1.5B - 72B  |    Yes    |   Yes   |   Yes   |  Yes  |
+|        Mistral        |      7B      |    Yes    |   Yes   |   Yes   |  No   |
+|        QWen-VL        |      7B      |    Yes    |   Yes   |   Yes   |  Yes  |
+|      DeepSeek-VL      |      7B      |    Yes    |   Yes   |   Yes   |  Yes  |
+|       Baichuan        |      7B      |    Yes    |   Yes   |   Yes   |  Yes  |
+|       Baichuan2       |      7B      |    Yes    |   Yes   |   Yes   |  Yes  |
+|      Code Llama       |   7B - 34B   |    Yes    |   Yes   |   Yes   |  No   |
+|          YI           |   6B - 34B   |    Yes    |   Yes   |   Yes   |  No   |
+|    LLaVA(1.5,1.6)     |   7B - 34B   |    Yes    |   Yes   |   Yes   |  Yes  |
+|     InternVL-Chat     |  v1.1- v1.5  |    Yes    |   Yes   |   Yes   |  Yes  |
+|       InternVL2       |    2B-40B    |    Yes    |   Yes   |   Yes   |  Yes  |
+|        MiniCPM        | Llama3-V-2_5 |    Yes    |   Yes   |   Yes   |  Yes  |
+|    MiniGeminiLlama    |      7B      |    Yes    |   No    |   No    |  Yes  |
+|         GLM4          |      9B      |    Yes    |   Yes   |   Yes   |  No   |
+|       CodeGeeX4       |      9B      |    Yes    |   Yes   |   Yes   |  No   |
 
 "-" means not verified yet.
 
@@ -66,6 +68,7 @@ The TurboMind engine doesn't support window attention. Therefore, for models tha
 |    CogVLM2-Chat     |     19B     |    Yes    |   No    |  No  |
 |   LLaVA(1.5,1.6)    |   7B-34B    |    Yes    |   No    |  No  |
 | InternVL-Chat(v1.5) |   2B-26B    |    Yes    |   No    |  No  |
+|      InternVL2      |   1B-40B    |    Yes    |   No    |  No  |
 |       Gemma2        |   9B-27B    |    Yes    |   No    |  No  |
 |        GLM4         |     9B      |    Yes    |   No    |  No  |
 |      CodeGeeX4      |     9B      |    Yes    |   No    |  No  |
diff --git a/docs/zh_cn/get_started.md b/docs/zh_cn/get_started.md
@@ -13,7 +13,7 @@ pip install lmdeploy
 LMDeploy的预编译包默认是基于 CUDA 12 编译的。如果需要在 CUDA 11+ 下安装 LMDeploy，请执行以下命令：
 
 ```shell
-export LMDEPLOY_VERSION=0.5.0
+export LMDEPLOY_VERSION=0.5.1
 export PYTHON_VERSION=38
 pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
 ```

diff --git a/docs/zh_cn/multi_modal/cogvlm.md b/docs/zh_cn/multi_modal/cogvlm.md
@@ -21,7 +21,7 @@ pip install torch==2.2.2 torchvision==0.17.2 xformers==0.0.26 --index-url https:
 
 ```shell
 # cuda 11.8
-export LMDEPLOY_VERSION=0.5.0
+export LMDEPLOY_VERSION=0.5.1
 export PYTHON_VERSION=38
 pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
 # cuda 12.1

diff --git a/docs/zh_cn/supported_models/supported_models.md b/docs/zh_cn/supported_models/supported_models.md
@@ -2,32 +2,34 @@
 
 ## TurboMind 支持的模型
 
-|        模型         |   模型规模   | FP16/BF16 | KV INT8 | KV INT4 | W4A16 |
-| :-----------------: | :----------: | :-------: | :-----: | :-----: | :---: |
-|        Llama        |   7B - 65B   |    Yes    |   Yes   |   Yes   |  Yes  |
-|       Llama2        |   7B - 70B   |    Yes    |   Yes   |   Yes   |  Yes  |
-|       Llama3        |   8B, 70B    |    Yes    |   Yes   |   Yes   |  Yes  |
-|      InternLM       |   7B - 20B   |    Yes    |   Yes   |   Yes   |  Yes  |
-|      InternLM2      |   7B - 20B   |    Yes    |   Yes   |   Yes   |  Yes  |
-|     InternLM2.5     |      7B      |    Yes    |   Yes   |   Yes   |  Yes  |
-| InternLM-XComposer  |      7B      |    Yes    |   Yes   |   Yes   |  Yes  |
-| InternLM-XComposer2 | 7B, 4khd-7B  |    Yes    |   Yes   |   Yes   |  Yes  |
-|        QWen         |  1.8B - 72B  |    Yes    |   Yes   |   Yes   |  Yes  |
-|       QWen1.5       | 1.8B - 110B  |    Yes    |   Yes   |   Yes   |  Yes  |
-|        QWen2        |  1.5B - 72B  |    Yes    |   Yes   |   Yes   |  Yes  |
-|       Mistral       |      7B      |    Yes    |   Yes   |   Yes   |  No   |
-|       QWen-VL       |      7B      |    Yes    |   Yes   |   Yes   |  Yes  |
-|     DeepSeek-VL     |      7B      |    Yes    |   Yes   |   Yes   |  Yes  |
-|      Baichuan       |      7B      |    Yes    |   Yes   |   Yes   |  Yes  |
-|      Baichuan2      |      7B      |    Yes    |   Yes   |   Yes   |  Yes  |
-|     Code Llama      |   7B - 34B   |    Yes    |   Yes   |   Yes   |  No   |
-|         YI          |   6B - 34B   |    Yes    |   Yes   |   Yes   |  No   |
-|   LLaVA(1.5,1.6)    |   7B - 34B   |    Yes    |   Yes   |   Yes   |  Yes  |
-|    InternVL-Chat    |  v1.1- v1.5  |    Yes    |   Yes   |   Yes   |  Yes  |
-|       MiniCPM       | Llama3-V-2_5 |    Yes    |   Yes   |   Yes   |  Yes  |
-|   MiniGeminiLlama   |      7B      |    Yes    |   No    |   No    |  Yes  |
-|        GLM4         |      9B      |    Yes    |   Yes   |   Yes   |  No   |
-|      CodeGeeX4      |      9B      |    Yes    |   Yes   |   Yes   |  No   |
+|         模型          |   模型规模   | FP16/BF16 | KV INT8 | KV INT4 | W4A16 |
+| :-------------------: | :----------: | :-------: | :-----: | :-----: | :---: |
+|         Llama         |   7B - 65B   |    Yes    |   Yes   |   Yes   |  Yes  |
+|        Llama2         |   7B - 70B   |    Yes    |   Yes   |   Yes   |  Yes  |
+|        Llama3         |   8B, 70B    |    Yes    |   Yes   |   Yes   |  Yes  |
+|       InternLM        |   7B - 20B   |    Yes    |   Yes   |   Yes   |  Yes  |
+|       InternLM2       |   7B - 20B   |    Yes    |   Yes   |   Yes   |  Yes  |
+|      InternLM2.5      |      7B      |    Yes    |   Yes   |   Yes   |  Yes  |
+|  InternLM-XComposer   |      7B      |    Yes    |   Yes   |   Yes   |  Yes  |
+|  InternLM-XComposer2  | 7B, 4khd-7B  |    Yes    |   Yes   |   Yes   |  Yes  |
+| InternLM-XComposer2.5 |      7B      |    Yes    |   Yes   |   Yes   |  Yes  |
+|         QWen          |  1.8B - 72B  |    Yes    |   Yes   |   Yes   |  Yes  |
+|        QWen1.5        | 1.8B - 110B  |    Yes    |   Yes   |   Yes   |  Yes  |
+|         QWen2         |  1.5B - 72B  |    Yes    |   Yes   |   Yes   |  Yes  |
+|        Mistral        |      7B      |    Yes    |   Yes   |   Yes   |  No   |
+|        QWen-VL        |      7B      |    Yes    |   Yes   |   Yes   |  Yes  |
+|      DeepSeek-VL      |      7B      |    Yes    |   Yes   |   Yes   |  Yes  |
+|       Baichuan        |      7B      |    Yes    |   Yes   |   Yes   |  Yes  |
+|       Baichuan2       |      7B      |    Yes    |   Yes   |   Yes   |  Yes  |
+|      Code Llama       |   7B - 34B   |    Yes    |   Yes   |   Yes   |  No   |
+|          YI           |   6B - 34B   |    Yes    |   Yes   |   Yes   |  No   |
+|    LLaVA(1.5,1.6)     |   7B - 34B   |    Yes    |   Yes   |   Yes   |  Yes  |
+|     InternVL-Chat     |  v1.1- v1.5  |    Yes    |   Yes   |   Yes   |  Yes  |
+|       InternVL2       |    2B-40B    |    Yes    |   Yes   |   Yes   |  Yes  |
+|        MiniCPM        | Llama3-V-2_5 |    Yes    |   Yes   |   Yes   |  Yes  |
+|    MiniGeminiLlama    |      7B      |    Yes    |   No    |   No    |  Yes  |
+|         GLM4          |      9B      |    Yes    |   Yes   |   Yes   |  No   |
+|       CodeGeeX4       |      9B      |    Yes    |   Yes   |   Yes   |  No   |
 
 “-” 表示还没有验证。
 
@@ -66,6 +68,7 @@ turbomind 引擎不支持 window attention。所以，对于应用了 window att
 |    CogVLM2-Chat     |     19B     |    Yes    |   No    |  No  |
 |   LLaVA(1.5,1.6)    |   7B-34B    |    Yes    |   No    |  No  |
 | InternVL-Chat(v1.5) |   2B-26B    |    Yes    |   No    |  No  |
+|      InternVL2      |   1B-40B    |    Yes    |   No    |  No  |
 |       Gemma2        |   9B-27B    |    Yes    |   No    |  No  |
 |        GLM4         |     9B      |    Yes    |   No    |  No  |
 |      CodeGeeX4      |     9B      |    Yes    |   No    |  No  |
diff --git a/lmdeploy/version.py b/lmdeploy/version.py
@@ -1,7 +1,7 @@
 # Copyright (c) OpenMMLab. All rights reserved.
 from typing import Tuple
 
-__version__ = '0.5.0'
+__version__ = '0.5.1'
 short_version = __version__