We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
请简要描述您遇到的问题。 / Please briefly describe the issue you encountered.
本地/root/ChatGLM目录下载的ChatGLM2-6B模型,
使用vllm部署server:
vllm serve /root/ChatGLM --chat-template ./examples/template_chatglm2.jinja --trust_remote_code --use-v2-block-manager
evalscope相关配置:
(evalscope) root@ubuntu:~/evalscope# cat eval_openai_api.yaml eval_backend: OpenCompass eval_config: datasets: - mmlu - ceval - ARC_c - gsm8k models: - openai_api_base: http://127.0.0.1:8000/v1/chat/completions path: /root/ChatGLM temperature: 0.0
(evalscope) root@ubuntu:~/evalscope# cat example_eval_openai_api.py from evalscope.run import run_task from evalscope.summarizer import Summarizer
def run_eval(): # Option 1: Python dictionary #task_cfg = task_cfg_dict
# Option 2: YAML configuration file task_cfg = 'eval_openai_api.yaml' # Option 3: JSON configuration file # task_cfg = 'eval_openai_api.json' run_task(task_cfg=task_cfg) print('>> Start to get the report with summarizer ...') report_list = Summarizer.get_report_from_cfg(task_cfg) print(f'\n>> The report list: {report_list}')
run_eval()
请提供您执行的主要代码或指令。 / Please provide the main code or commands you executed. 例如 / For example:
执行测试: python example_eval_openai_api.py
请粘贴完整的错误日志或控制台输出。 / Please paste the full error log or console output. 例如 / For example:
dataset version metric mode /root/ChatGLM
--------- 考试 Exam --------- - - - - ceval - - - - cmb - - - - agieval - - - - mmlu - - - - GaokaoBench - - - - ARC-c - - - - ARC-e - - - - --------- 语言 Language --------- - - - - WiC - - - - summedits - - - - chid-dev - - - - afqmc-dev - - - - bustm-dev - - - - cluewsc-dev - - - - WSC - - - - winogrande - - - - flores_100 - - - - --------- 知识 Knowledge --------- - - - - BoolQ - - - - commonsense_qa - - - - nq - - - - triviaqa - - - - --------- 推理 Reasoning --------- - - - - cmnli - - - - ocnli - - - - ocnli_fc-dev - - - - AX_b - - - - AX_g - - - - CB - - - - RTE - - - - story_cloze - - - - COPA - - - - ReCoRD - - - - hellaswag - - - - piqa - - - - siqa - - - - strategyqa - - - - math - - - - gsm8k - - - - TheoremQA - - - - openai_humaneval - - - - mbpp - - - - bbh - - - - --------- 理解 Understanding --------- - - - - C3 - - - - CMRC_dev - - - - DRCD_dev - - - - MultiRC - - - - race-middle - - - - race-high - - - - openbookqa_fact - - - - csl_dev - - - - lcsts - - - - Xsum - - - - eprstmt-dev - - - - lambada - - - - tnews-dev - - - - 11/07 07:06:42 - OpenCompass - INFO - write summary to /root/evalscope/outputs/default/20241107_070629/summary/summary_20241107_070629.txt 11/07 07:06:42 - OpenCompass - INFO - write csv to /root/evalscope/outputs/default/20241107_070629/summary/summary_20241107_070629.csv
Start to get the report with summarizer ... 2024-11-07 07:06:42,022 - evalscope - INFO - **Loading task cfg for summarizer: {'eval_backend': 'OpenCompass', 'eval_config': {'datasets': ['mmlu', 'ceval', 'ARC_c', 'gsm8k'], 'models': [{'openai_api_base': 'http://127.0.0.1:8000/v1/chat/completions', 'path': '/root/ChatGLM', 'temperature': 0.0}]}}
The report list: [{'dataset': '--------- 考试 Exam ---------', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'ceval', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'cmb', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'agieval', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'mmlu', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'GaokaoBench', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'ARC-c', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'ARC-e', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': '--------- 语言 Language ---------', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'WiC', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'summedits', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'chid-dev', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'afqmc-dev', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'bustm-dev', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'cluewsc-dev', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'WSC', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'winogrande', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'flores_100', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': '--------- 知识 Knowledge ---------', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'BoolQ', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'commonsense_qa', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'nq', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'triviaqa', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': '--------- 推理 Reasoning ---------', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'cmnli', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'ocnli', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'ocnli_fc-dev', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'AX_b', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'AX_g', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'CB', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'RTE', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'story_cloze', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'COPA', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'ReCoRD', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'hellaswag', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'piqa', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'siqa', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'strategyqa', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'math', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'gsm8k', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'TheoremQA', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'openai_humaneval', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'mbpp', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'bbh', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': '--------- 理解 Understanding ---------', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'C3', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'CMRC_dev', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'DRCD_dev', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'MultiRC', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'race-middle', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'race-high', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'openbookqa_fact', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'csl_dev', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'lcsts', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'Xsum', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'eprstmt-dev', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'lambada', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'tnews-dev', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}]
操作系统 / Operating System:
Python版本 / Python Version:
如果有其他相关信息,请在此处提供。 / If there is any other relevant information, please provide it here.
The text was updated successfully, but these errors were encountered:
请问日志中有error相关字样的log么? 如有则可以进到outputs相对应的logs文件夹中查看对应的error明细 / Please check the error log file in the outputs directory and get details of err msg.
Sorry, something went wrong.
另外请check一下,评测相关的data是否有预先准备: 参考 https://evalscope.readthedocs.io/zh-cn/latest/user_guides/backend/opencompass_backend.html
outputs目录下,有个txt文档,里面没有看到报错。日志文件80M,无法上传。
vllm端有打印,模型应该是有接收到请求并做了处理:
数据文件已经下载,,并解压到当前目录下,目录名称"data"
data目录下的数据集文件
wangxingjun778
No branches or pull requests
问题描述 / Issue Description
请简要描述您遇到的问题。 / Please briefly describe the issue you encountered.
本地/root/ChatGLM目录下载的ChatGLM2-6B模型,
使用vllm部署server:
vllm serve /root/ChatGLM --chat-template ./examples/template_chatglm2.jinja --trust_remote_code --use-v2-block-manager
evalscope相关配置:
(evalscope) root@ubuntu:~/evalscope# cat eval_openai_api.yaml
eval_backend: OpenCompass
eval_config:
datasets:
- mmlu
- ceval
- ARC_c
- gsm8k
models:
- openai_api_base: http://127.0.0.1:8000/v1/chat/completions
path: /root/ChatGLM
temperature: 0.0
(evalscope) root@ubuntu:~/evalscope# cat example_eval_openai_api.py
from evalscope.run import run_task
from evalscope.summarizer import Summarizer
def run_eval():
# Option 1: Python dictionary
#task_cfg = task_cfg_dict
run_eval()
使用的工具 / Tools Used
执行的代码或指令 / Code or Commands Executed
请提供您执行的主要代码或指令。 / Please provide the main code or commands you executed. 例如 / For example:
执行测试: python example_eval_openai_api.py
错误日志 / Error Log
请粘贴完整的错误日志或控制台输出。 / Please paste the full error log or console output. 例如 / For example:
dataset version metric mode /root/ChatGLM
--------- 考试 Exam --------- - - - -
ceval - - - -
cmb - - - -
agieval - - - -
mmlu - - - -
GaokaoBench - - - -
ARC-c - - - -
ARC-e - - - -
--------- 语言 Language --------- - - - -
WiC - - - -
summedits - - - -
chid-dev - - - -
afqmc-dev - - - -
bustm-dev - - - -
cluewsc-dev - - - -
WSC - - - -
winogrande - - - -
flores_100 - - - -
--------- 知识 Knowledge --------- - - - -
BoolQ - - - -
commonsense_qa - - - -
nq - - - -
triviaqa - - - -
--------- 推理 Reasoning --------- - - - -
cmnli - - - -
ocnli - - - -
ocnli_fc-dev - - - -
AX_b - - - -
AX_g - - - -
CB - - - -
RTE - - - -
story_cloze - - - -
COPA - - - -
ReCoRD - - - -
hellaswag - - - -
piqa - - - -
siqa - - - -
strategyqa - - - -
math - - - -
gsm8k - - - -
TheoremQA - - - -
openai_humaneval - - - -
mbpp - - - -
bbh - - - -
--------- 理解 Understanding --------- - - - -
C3 - - - -
CMRC_dev - - - -
DRCD_dev - - - -
MultiRC - - - -
race-middle - - - -
race-high - - - -
openbookqa_fact - - - -
csl_dev - - - -
lcsts - - - -
Xsum - - - -
eprstmt-dev - - - -
lambada - - - -
tnews-dev - - - -
11/07 07:06:42 - OpenCompass - INFO - write summary to /root/evalscope/outputs/default/20241107_070629/summary/summary_20241107_070629.txt
11/07 07:06:42 - OpenCompass - INFO - write csv to /root/evalscope/outputs/default/20241107_070629/summary/summary_20241107_070629.csv
运行环境 / Runtime Environment
操作系统 / Operating System:
Python版本 / Python Version:
其他信息 / Additional Information
如果有其他相关信息,请在此处提供。 / If there is any other relevant information, please provide it here.
The text was updated successfully, but these errors were encountered: