add docs for evaluation with opencompass (#995)

InternLM · Jan 23, 2024 · da190ef · da190ef
1 parent 4db2502
commit da190ef
Show file tree

Hide file tree

Showing 7 changed files with 319 additions and 0 deletions.
diff --git a/docs/en/benchmark/evaluate_with_opencompass.md b/docs/en/benchmark/evaluate_with_opencompass.md
@@ -0,0 +1,159 @@
+# Evaluate LLMs with OpenCompass
+
+The LLMs accelerated by lmdeploy can be evaluated with OpenCompass.
+
+## Setup
+
+In this part, we are going to setup the environment for evaluation.
+
+### Install lmdeploy
+
+Install lmdeploy through pip (python 3.8+). If you want to install from source, you can refer to [build.md](../build.md).
+
+```shell
+pip install lmdeploy
+```
+
+### Install OpenCompass
+
+Install OpenCompass from source. Refer to [installation](https://opencompass.readthedocs.io/en/latest/get_started/installation.html) for more information.
+
+```shell
+git clone https://github.com/open-compass/opencompass.git
+cd opencompass
+pip install -e .
+```
+
+At present, you can check the [Quick Start](https://opencompass.readthedocs.io/en/latest/get_started/quick_start.html#)
+to get to know the basic usage of OpenCompass.
+
+### Download datasets
+
+Download the core datasets
+
+```shell
+# Run in the OpenCompass directory
+cd opencompass
+wget https://github.com/open-compass/opencompass/releases/download/0.1.8.rc1/OpenCompassData-core-20231110.zip
+unzip OpenCompassData-core-20231110.zip
+```
+
+## Prepare Evaluation Config
+
+OpenCompass uses the configuration files as the OpenMMLab style. One can define a python config and start evaluating at ease.
+OpenCompass has supported the evaluation for lmdeploy's TurboMind engine using python API.
+
+### Dataset Config
+
+In the home directory of OpenCompass, we are writing the config file `$OPENCOMPASS_DIR/configs/eval_lmdeploy.py`.
+We select multiple predefined datasets and import them from OpenCompass base dataset configs as `datasets`.
+
+```python
+from mmengine.config import read_base
+
+
+with read_base():
+    # choose a list of datasets
+    from .datasets.mmlu.mmlu_gen_a484b3 import mmlu_datasets
+    from .datasets.ceval.ceval_gen_5f30c7 import ceval_datasets
+    from .datasets.SuperGLUE_WiC.SuperGLUE_WiC_gen_d06864 import WiC_datasets
+    from .datasets.SuperGLUE_WSC.SuperGLUE_WSC_gen_7902a7 import WSC_datasets
+    from .datasets.triviaqa.triviaqa_gen_2121ce import triviaqa_datasets
+    from .datasets.gsm8k.gsm8k_gen_1d7fe4 import gsm8k_datasets
+    from .datasets.race.race_gen_69ee4f import race_datasets
+    from .datasets.crowspairs.crowspairs_gen_381af0 import crowspairs_datasets
+    # and output the results in a chosen format
+    from .summarizers.medium import summarizer
+
+datasets = sum((v for k, v in locals().items() if k.endswith('_datasets')), [])
+```
+
+### Model Config
+
+This part shows how to setup model config for LLMs. Let's check some examples:
+
+`````{tabs}
+````{tab} internlm-20b
+
+```python
+from opencompass.models.turbomind import TurboMindModel
+
+internlm_20b = dict(
+        type=TurboMindModel,
+        abbr='internlm-20b-turbomind',
+        path="internlm/internlm-20b",  # this path should be same as in huggingface
+        engine_config=dict(session_len=2048,
+                           max_batch_size=8,
+                           rope_scaling_factor=1.0),
+        gen_config=dict(top_k=1, top_p=0.8,
+                        temperature=1.0,
+                        max_new_tokens=100),
+        max_out_len=100,
+        max_seq_len=2048,
+        batch_size=8,
+        concurrency=8,
+        run_cfg=dict(num_gpus=1, num_procs=1),
+    )
+
+models = [internlm_20b]
+```
+
+````
+
+````{tab} internlm-chat-20b
+
+For Chat models, you have to pass `meta_template` for chat models. Different Chat models may have different `meta_template` and it's important
+to keep it the same as in training settings. You can read [meta_template](https://opencompass.readthedocs.io/en/latest/prompt/meta_template.html) for more information.
+
+
+```python
+from opencompass.models.turbomind import TurboMindModel
+
+internlm_meta_template = dict(round=[
+    dict(role='HUMAN', begin='<|User|>:', end='\n'),
+    dict(role='BOT', begin='<|Bot|>:', end='<eoa>\n', generate=True),
+],
+                              eos_token_id=103028)
+
+internlm_chat_20b = dict(
+    type=TurboMindModel,
+    abbr='internlm-chat-20b-turbomind',
+    path='internlm/internlm-chat-20b',
+    engine_config=dict(session_len=2048,
+                       max_batch_size=8,
+                       rope_scaling_factor=1.0),
+    gen_config=dict(top_k=1,
+                    top_p=0.8,
+                    temperature=1.0,
+                    max_new_tokens=100),
+    max_out_len=100,
+    max_seq_len=2048,
+    batch_size=8,
+    concurrency=8,
+    meta_template=internlm_meta_template,
+    run_cfg=dict(num_gpus=1, num_procs=1),
+)
+
+models = [internlm_chat_20b]
+
+```
+
+````
+
+`````
+
+**Note**
+
+- If you want to pass more arguments for `engine_config`和`gen_config` in the evaluation config file, please refer to [TurbomindEngineConfig](https://lmdeploy.readthedocs.io/en/latest/inference/pipeline.html#turbomindengineconfig)
+  and [EngineGenerationConfig](https://lmdeploy.readthedocs.io/en/latest/inference/pipeline.html#generationconfig)
+
+## Execute Evaluation Task
+
+After defining the evaluation config, we can run the following command to start evaluating models.
+You can check [Execution Task](https://opencompass.readthedocs.io/en/latest/user_guides/experimentation.html#task-execution-and-monitoring)
+for more arguments of `run.py`.
+
+```shell
+# in the root directory of opencompass
+python3 run.py configs/eval_lmdeploy.py --work-dir ./workdir
+```
diff --git a/docs/en/conf.py b/docs/en/conf.py
@@ -52,6 +52,7 @@
     'sphinx.ext.napoleon',
     'sphinx.ext.viewcode',
     'sphinx.ext.autosectionlabel',
+    'sphinx_tabs.tabs',
     'sphinx_markdown_tables',
     'myst_parser',
     'sphinx_copybutton',

diff --git a/docs/en/index.rst b/docs/en/index.rst
@@ -23,6 +23,7 @@ Welcome to LMDeploy's tutorials!
    benchmark/profile_throughput.md
    benchmark/profile_api_server.md
    benchmark/profile_triton_server.md
+   benchmark/evaluate_with_opencompass.md
 
 .. _supported_models:
 .. toctree::

diff --git a/docs/zh_cn/benchmark/evaluate_with_opencompass.md b/docs/zh_cn/benchmark/evaluate_with_opencompass.md
@@ -0,0 +1,155 @@
+# 如何使用OpenCompass测评LLMs
+
+LMDeploy设计了TurboMind推理引擎用来加速大模型推理，其推理精度也支持使用OpenCompass测评。
+
+## 准备
+
+我们将配置用于测评的环境
+
+### 安装 lmdeploy
+
+使用 pip (python 3.8+) 安装 LMDeploy，或者[源码安装](../build.md)
+
+```shell
+pip install lmdeploy
+```
+
+### 安装 OpenCompass
+
+执行如下脚本，从源码安装OpenCompass。更多安装方式请参考[installation](https://opencompass.readthedocs.io/en/latest/get_started/installation.html)。
+
+```shell
+git clone https://github.com/open-compass/opencompass.git
+cd opencompass
+pip install -e .
+```
+
+如果想快速了解OpenCompass基本操作，可翻阅[Quick Start](https://opencompass.readthedocs.io/en/latest/get_started/quick_start.html#)
+
+### 下载数据集
+
+OpenCompass提供了多个版本的数据集，在这里我们下载如下版本数据集
+
+```shell
+# 切换到OpenCompass根目录
+cd opencompass
+wget https://github.com/open-compass/opencompass/releases/download/0.1.8.rc1/OpenCompassData-core-20231110.zip
+unzip OpenCompassData-core-20231110.zip
+```
+
+## 准备测评配置文件
+
+OpenCompass采用OpenMMLab风格的配置文件来管理模型和数据集，用户只需添加简单的配置就可以快速开始测评。OpenCompass已支持通过python API来
+测评TurboMind推理引擎加速的大模型。
+
+### 数据集配置
+
+在OpenCompass根目录，准备测评配置文件`$OPENCOMPASS_DIR/configs/eval_lmdeploy.py`。
+
+在配置文件开始，导入如下OpenCompass支持的数据集`datasets`和格式化输出测评结果的`summarizer`。
+
+```python
+from mmengine.config import read_base
+
+
+with read_base():
+    # choose a list of datasets
+    from .datasets.mmlu.mmlu_gen_a484b3 import mmlu_datasets
+    from .datasets.ceval.ceval_gen_5f30c7 import ceval_datasets
+    from .datasets.SuperGLUE_WiC.SuperGLUE_WiC_gen_d06864 import WiC_datasets
+    from .datasets.SuperGLUE_WSC.SuperGLUE_WSC_gen_7902a7 import WSC_datasets
+    from .datasets.triviaqa.triviaqa_gen_2121ce import triviaqa_datasets
+    from .datasets.gsm8k.gsm8k_gen_1d7fe4 import gsm8k_datasets
+    from .datasets.race.race_gen_69ee4f import race_datasets
+    from .datasets.crowspairs.crowspairs_gen_381af0 import crowspairs_datasets
+    # and output the results in a chosen format
+    from .summarizers.medium import summarizer
+
+datasets = sum((v for k, v in locals().items() if k.endswith('_datasets')), [])
+```
+
+### 模型配置
+
+这个部分展示如何在测评配置文件中添加模型配置。让我们来看几个示例：
+
+`````{tabs}
+````{tab} internlm-20b
+
+```python
+from opencompass.models.turbomind import TurboMindModel
+
+internlm_20b = dict(
+        type=TurboMindModel,
+        abbr='internlm-20b-turbomind',
+        path="internlm/internlm-20b",  # this path should be same as in huggingface
+        engine_config=dict(session_len=2048,
+                           max_batch_size=8,
+                           rope_scaling_factor=1.0),
+        gen_config=dict(top_k=1, top_p=0.8,
+                        temperature=1.0,
+                        max_new_tokens=100),
+        max_out_len=100,
+        max_seq_len=2048,
+        batch_size=8,
+        concurrency=8,
+        run_cfg=dict(num_gpus=1, num_procs=1),
+    )
+
+models = [internlm_20b]
+```
+
+````
+
+````{tab} internlm-chat-20b
+
+对于Chat类大模型，用户需要在配置文件中指定`meta_template`，该设置需要与训练设置对齐，可翻阅[meta_template](https://opencompass.readthedocs.io/en/latest/prompt/meta_template.html) 查看其介绍。
+
+```python
+from opencompass.models.turbomind import TurboMindModel
+
+internlm_meta_template = dict(round=[
+    dict(role='HUMAN', begin='<|User|>:', end='\n'),
+    dict(role='BOT', begin='<|Bot|>:', end='<eoa>\n', generate=True),
+],
+                              eos_token_id=103028)
+
+internlm_chat_20b = dict(
+    type=TurboMindModel,
+    abbr='internlm-chat-20b-turbomind',
+    path='internlm/internlm-chat-20b',
+    engine_config=dict(session_len=2048,
+                       max_batch_size=8,
+                       rope_scaling_factor=1.0),
+    gen_config=dict(top_k=1,
+                    top_p=0.8,
+                    temperature=1.0,
+                    max_new_tokens=100),
+    max_out_len=100,
+    max_seq_len=2048,
+    batch_size=8,
+    concurrency=8,
+    meta_template=internlm_meta_template,
+    run_cfg=dict(num_gpus=1, num_procs=1),
+)
+
+models = [internlm_chat_20b]
+
+```
+
+````
+
+`````
+
+**注**
+
+- 如果想在测评配置文件中`engine_config`和`gen_config`字段传递更多参数，请参考[TurbomindEngineConfig](https://lmdeploy.readthedocs.io/zh-cn/latest/inference/pipeline.html#turbomindengineconfig) 和 [EngineGenerationConfig](https://lmdeploy.readthedocs.io/zh-cn/latest/inference/pipeline.html#generationconfig)
+
+## 执行测评任务
+
+完成测评配置文件编写后，在OpenCompass根目录执行`run.py`脚本，指定工作目录即可开启测评任务。
+测评脚本更多参数可参考[执行测评](https://opencompass.readthedocs.io/zh-cn/latest/user_guides/experimentation.html#id1)
+
+```shell
+# in the root directory of opencompass
+python3 run.py configs/eval_lmdeploy.py --work-dir ./workdir
+```
diff --git a/docs/zh_cn/conf.py b/docs/zh_cn/conf.py
@@ -53,6 +53,7 @@
     'sphinx.ext.napoleon',
     'sphinx.ext.viewcode',
     'sphinx.ext.autosectionlabel',
+    'sphinx_tabs.tabs',
     'sphinx_markdown_tables',
     'myst_parser',
     'sphinx_copybutton',

diff --git a/docs/zh_cn/index.rst b/docs/zh_cn/index.rst
@@ -24,6 +24,7 @@
    benchmark/profile_throughput.md
    benchmark/profile_api_server.md
    benchmark/profile_triton_server.md
+   benchmark/evaluate_with_opencompass.md
 
 .. _支持的模型:
 .. toctree::

diff --git a/requirements/docs.txt b/requirements/docs.txt
@@ -7,5 +7,6 @@ myst-parser
 recommonmark
 sphinx==4.0.2
 sphinx-copybutton
+sphinx-tabs
 sphinx_markdown_tables>=0.0.16
 sphinxcontrib-mermaid