测试llama-7b的指标与opencompass贴出来的指标不一致 #256
-
使用opencompass测试llama-7b的ceval指标,各个学科的平均分为24.66,但opencompass官方贴出来的结果为27.3。请问哪里会导致差异。 本地的评测代码如下 : from mmengine.config import read_base
from opencompass.models import HuggingFaceCausalLM
models = [
# LLaMA 7B
dict(
type=HuggingFaceCausalLM,
abbr='llama-7b-hf',
path="huggyllama/llama-7b",
tokenizer_path='huggyllama/llama-7b',
tokenizer_kwargs=dict(padding_side='left',
truncation_side='left',
use_fast=False,
),
max_out_len=100,
max_seq_len=2048,
batch_size=8,
model_kwargs=dict(device_map='auto'),
batch_padding=False, # if false, inference with for-loop without batch padding
run_cfg=dict(num_gpus=1, num_procs=1),
)
]
with read_base():
from .datasets.ceval.ceval_ppl import ceval_datasets
datasets = [*ceval_datasets]
|
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 8 replies
-
将52个学科的分数求平均分,得到24.66 |
Beta Was this translation helpful? Give feedback.
-
评测了chinese-llama-2-7b, 测出来的分数与opencompass差距甚远。
评测脚本如下: from mmengine.config import read_base
from opencompass.models import HuggingFaceCausalLM
batch_size = 20
# 指定评测模型
model_name_or_paths = [
'ziqingyang/chinese-llama-2-7b'
]
models = []
for model_name_or_path in model_name_or_paths:
model = dict(
type=HuggingFaceCausalLM,
abbr=model_name_or_path,
path=model_name_or_path,
tokenizer_path=model_name_or_path,
tokenizer_kwargs=dict(padding_side='left',
truncation_side='left',
use_fast=False,
trust_remote_code=True
),
max_out_len=100,
max_seq_len=2048,
batch_size=batch_size,
model_kwargs=dict(device_map='auto', trust_remote_code=True),
batch_padding=False, # if false, inference with for-loop without batch padding
run_cfg=dict(num_gpus=2, num_procs=2),
)
models.append(model)
# 指定评测集
with read_base():
from .datasets.ceval.ceval_ppl import ceval_datasets
# from .datasets.collections.base_medium import datasets
# from .models.llama2_7b import models
datasets = [*ceval_datasets]
# python run.py configs/eval_demo.py -w outputs/demo |
Beta Was this translation helpful? Give feedback.
-
I reran my test after encountering this issue. Here is my reproduced details:
The immediate output files: 20230825_032645.zip |
Beta Was this translation helpful? Give feedback.
-
HF model revisions: Dependency versions: |
Beta Was this translation helpful? Give feedback.
you are right, my torch is 2.0, the reason is the dependency version.
when I change the version the same with yours, I get the right result.