[Bug] stop_at_stop_token 删除了生成的方法体导致没有评估结果 #1710

886gb · 2024-11-23T14:58:11Z

先决条件

我已经搜索过问题和讨论但未得到预期的帮助。
错误在最新版本中尚未被修复。

问题类型

我正在使用官方支持的任务/模型/数据集进行评估。

环境

/opencompass/opencompass/init.py:17: UserWarning: Starting from v0.4.0, all AMOTIC configuration files currently located in ./configs/datasets, ./configs/models, and ./configs/summarizers will be migrated to the opencompass/configs/ package. Please update your configuration file paths accordingly

重现问题 - 代码/配置示例

from opencompass.models import OpenAISDK

api_meta_template = dict(
round=[
dict(role='HUMAN', api_role='HUMAN'),
dict(role='BOT', api_role='BOT', generate=True),
],
reserved_roles=[dict(role='SYSTEM', api_role='SYSTEM')],
)
generation_kwargs = dict(do_sample=False)
models = [
dict(
abbr='coder_task2_llama3_8b_raw',
type=OpenAISDK,
key='coder-evaluate', # API key
openai_api_base='http://0.0.0.0:12892/v1', # 服务地址
path='/weight/models--meta-llama--Meta-Llama-3.1-8B-Instruct/', # 请求服务时的 model name
tokenizer_path='/weight/models--meta-llama--Meta-Llama-3.1-8B-Instruct/', # 请求服务时的 tokenizer name 或 path, 为None时使用默认tokenizer gpt-4
rpm_verbose=True, # 是否打印请求速率
meta_template=api_meta_template, # 服务请求模板
query_per_second=1, # 服务请求速率
max_out_len=4096, # 最大输出长度
max_seq_len=4096, # 最大输入长度
temperature=0, # 生成温度
batch_size=8, # 批处理大小
retry=5, # 重试次数
)
]

from mmengine.config import read_base

with read_base():
from .datasets.humaneval_multi.humaneval_multi_gen_82cf85 import humaneval_multi_datasets
from .models.task2_llama3_8b_raw import models as task2_llama3_8b_raw

datasets = humaneval_multi_datasets
models = task2_llama3_8b_raw

work_dir = 'outputs/test/'

重现问题 - 命令或脚本

dataset	version	metric	mode	task2_llama3_8b_raw
humaneval_multiple-cpp	8761bb	pass@1	gen	0.00
humaneval_multiple-cs	8761bb	pass@1	gen	0.00
humaneval_multiple-d	8761bb	pass@1	gen	0.00
humaneval_multiple-go	8761bb	pass@1	gen	0.00
humaneval_multiple-java	8761bb	pass@1	gen	0.00
humaneval_multiple-jl	8761bb	pass@1	gen	0.00
humaneval_multiple-js	8761bb	pass@1	gen	0.00
humaneval_multiple-lua	8761bb	pass@1	gen	0.00
humaneval_multiple-php	8761bb	pass@1	gen	0.00
humaneval_multiple-pl	8761bb	pass@1	gen	0.00
humaneval_multiple-py	8761bb	pass@1	gen	27.95
humaneval_multiple-r	8761bb	pass@1	gen	0.00
humaneval_multiple-rb	8761bb	pass@1	gen	0.00
humaneval_multiple-rkt	8761bb	pass@1	gen	0.00
humaneval_multiple-rs	8761bb	pass@1	gen	0.00
humaneval_multiple-scala	8761bb	pass@1	gen	0.00
humaneval_multiple-sh	8761bb	pass@1	gen	0.00
humaneval_multiple-swift	8761bb	pass@1	gen	0.00
humaneval_multiple-ts	8761bb	pass@1	gen	0.00

在评测humaneval_multiple时只有python有结果，看了看应该是

def stop_at_stop_token(self, decoded_string, stop_tokens):
    """Produces the prefix of decoded_string that ends at the first
    occurrence of a stop_token.

    WARNING: the decoded_string *must not* include the prompt,
    which may have stop tokens itself.
    """
    min_stop_index = len(decoded_string)
    for stop_token in stop_tokens:
        # import pdb;pdb.set_trace()
        stop_index = decoded_string.find(stop_token)
        if stop_index != -1 and stop_index < min_stop_index:
            min_stop_index = stop_index
    return decoded_string[:min_stop_index]
    会把生成的方法体删掉，stop_tokens的处理逻辑是什么呢？

重现问题 - 错误信息

由于后处理导致humaneval_multiple无结果,由于stop_at_stop_token的处理，进入评测的代码是

'results': [{'program': 'from typing import List

def unique_digits(x: List[int]) -> List[int]:
"""Given a list of positive integers x. return a sorted list of all
elements that hasn't any even digit.
Note: Returned list should be sorted in increasing order.
For example:

unique_digits([15, 33, 1422, 1])
[1, 15, 33]
unique_digits([152, 323, 1422, 10])
[]"""
Here is the Python code that meets the requirements:

from typing import List

def check(candidate):
    assert candidate([15, 33, 1422, 1]) == [1, 15, 33]
    assert candidate([152, 323, 1422, 10]) == []
    assert candidate([12345, 2033, 111, 151]) == [111, 151]
    assert candidate([135, 103, 31]) == [31, 135]

def test_check():
    check(unique_digits)

test_check()
', 'timestamp': 1732391329, 'stdout': '', 'stderr': '  File "/tmp/tmp8axwhm_n.py", line 12
    Here is the Python code that meets the requirements:
                ^^^^^^
SyntaxError: invalid syntax
', 'exit_code': 1, 'status': 'SyntaxError'}]}


### 其他信息

后处理的逻辑

The text was updated successfully, but these errors were encountered:

mm-assistant bot assigned tonysy Nov 23, 2024

886gb closed this as completed Nov 23, 2024

886gb reopened this Nov 23, 2024

886gb changed the title ~~[Bug]~~ [Bug] stop_at_stop_token 删除了生成的方法体导致没有评估结果 Nov 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] stop_at_stop_token 删除了生成的方法体导致没有评估结果 #1710

[Bug] stop_at_stop_token 删除了生成的方法体导致没有评估结果 #1710

886gb commented Nov 23, 2024 •

edited

Loading

[Bug] stop_at_stop_token 删除了生成的方法体导致没有评估结果 #1710

[Bug] stop_at_stop_token 删除了生成的方法体导致没有评估结果 #1710

Comments

886gb commented Nov 23, 2024 • edited Loading

先决条件

问题类型

环境

重现问题 - 代码/配置示例

重现问题 - 命令或脚本

重现问题 - 错误信息

886gb commented Nov 23, 2024 •

edited

Loading