Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] stop_at_stop_token 删除了生成的方法体导致没有评估结果 #1710

Open
2 tasks done
886gb opened this issue Nov 23, 2024 · 0 comments
Open
2 tasks done
Assignees

Comments

@886gb
Copy link

886gb commented Nov 23, 2024

先决条件

问题类型

我正在使用官方支持的任务/模型/数据集进行评估。

环境

/opencompass/opencompass/init.py:17: UserWarning: Starting from v0.4.0, all AMOTIC configuration files currently located in ./configs/datasets, ./configs/models, and ./configs/summarizers will be migrated to the opencompass/configs/ package. Please update your configuration file paths accordingly

重现问题 - 代码/配置示例

from opencompass.models import OpenAISDK

api_meta_template = dict(
round=[
dict(role='HUMAN', api_role='HUMAN'),
dict(role='BOT', api_role='BOT', generate=True),
],
reserved_roles=[dict(role='SYSTEM', api_role='SYSTEM')],
)
generation_kwargs = dict(do_sample=False)
models = [
dict(
abbr='coder_task2_llama3_8b_raw',
type=OpenAISDK,
key='coder-evaluate', # API key
openai_api_base='http://0.0.0.0:12892/v1', # 服务地址
path='/weight/models--meta-llama--Meta-Llama-3.1-8B-Instruct/', # 请求服务时的 model name
tokenizer_path='/weight/models--meta-llama--Meta-Llama-3.1-8B-Instruct/', # 请求服务时的 tokenizer name 或 path, 为None时使用默认tokenizer gpt-4
rpm_verbose=True, # 是否打印请求速率
meta_template=api_meta_template, # 服务请求模板
query_per_second=1, # 服务请求速率
max_out_len=4096, # 最大输出长度
max_seq_len=4096, # 最大输入长度
temperature=0, # 生成温度
batch_size=8, # 批处理大小
retry=5, # 重试次数
)
]

from mmengine.config import read_base

with read_base():
from .datasets.humaneval_multi.humaneval_multi_gen_82cf85 import humaneval_multi_datasets
from .models.task2_llama3_8b_raw import models as task2_llama3_8b_raw

datasets = humaneval_multi_datasets
models = task2_llama3_8b_raw

work_dir = 'outputs/test/'

重现问题 - 命令或脚本

dataset version metric mode task2_llama3_8b_raw
humaneval_multiple-cpp 8761bb pass@1 gen 0.00
humaneval_multiple-cs 8761bb pass@1 gen 0.00
humaneval_multiple-d 8761bb pass@1 gen 0.00
humaneval_multiple-go 8761bb pass@1 gen 0.00
humaneval_multiple-java 8761bb pass@1 gen 0.00
humaneval_multiple-jl 8761bb pass@1 gen 0.00
humaneval_multiple-js 8761bb pass@1 gen 0.00
humaneval_multiple-lua 8761bb pass@1 gen 0.00
humaneval_multiple-php 8761bb pass@1 gen 0.00
humaneval_multiple-pl 8761bb pass@1 gen 0.00
humaneval_multiple-py 8761bb pass@1 gen 27.95
humaneval_multiple-r 8761bb pass@1 gen 0.00
humaneval_multiple-rb 8761bb pass@1 gen 0.00
humaneval_multiple-rkt 8761bb pass@1 gen 0.00
humaneval_multiple-rs 8761bb pass@1 gen 0.00
humaneval_multiple-scala 8761bb pass@1 gen 0.00
humaneval_multiple-sh 8761bb pass@1 gen 0.00
humaneval_multiple-swift 8761bb pass@1 gen 0.00
humaneval_multiple-ts 8761bb pass@1 gen 0.00

在评测humaneval_multiple时只有python有结果,看了看应该是

def stop_at_stop_token(self, decoded_string, stop_tokens):
    """Produces the prefix of decoded_string that ends at the first
    occurrence of a stop_token.

    WARNING: the decoded_string *must not* include the prompt,
    which may have stop tokens itself.
    """
    min_stop_index = len(decoded_string)
    for stop_token in stop_tokens:
        # import pdb;pdb.set_trace()
        stop_index = decoded_string.find(stop_token)
        if stop_index != -1 and stop_index < min_stop_index:
            min_stop_index = stop_index
    return decoded_string[:min_stop_index]
    会把生成的方法体删掉,stop_tokens的处理逻辑是什么呢?

重现问题 - 错误信息

由于后处理导致humaneval_multiple无结果,由于stop_at_stop_token的处理,进入评测的代码是

'results': [{'program': 'from typing import List

def unique_digits(x: List[int]) -> List[int]:
"""Given a list of positive integers x. return a sorted list of all
elements that hasn't any even digit.
Note: Returned list should be sorted in increasing order.
For example:

unique_digits([15, 33, 1422, 1])
[1, 15, 33]
unique_digits([152, 323, 1422, 10])
[]"""
Here is the Python code that meets the requirements:

from typing import List

def check(candidate):
    assert candidate([15, 33, 1422, 1]) == [1, 15, 33]
    assert candidate([152, 323, 1422, 10]) == []
    assert candidate([12345, 2033, 111, 151]) == [111, 151]
    assert candidate([135, 103, 31]) == [31, 135]

def test_check():
    check(unique_digits)

test_check()
', 'timestamp': 1732391329, 'stdout': '', 'stderr': '  File "/tmp/tmp8axwhm_n.py", line 12
    Here is the Python code that meets the requirements:
                ^^^^^^
SyntaxError: invalid syntax
', 'exit_code': 1, 'status': 'SyntaxError'}]}


### 其他信息

后处理的逻辑
@886gb 886gb closed this as completed Nov 23, 2024
@886gb 886gb reopened this Nov 23, 2024
@886gb 886gb changed the title [Bug] [Bug] stop_at_stop_token 删除了生成的方法体导致没有评估结果 Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants