[Question]: The evaluation code of scbench does not match the provided dataset. #103

rainstorm12 · 2024-12-26T03:00:25Z

Describe the bug

When I tested the "scbench_kv" task provided by scbench, I encountered the following problems in the compute_scores.py file during the evaluation.

[rank0]:   File "/myfile/MInference-main/scbench/compute_scores.py", line 365, in get_score_one
[rank0]:     assert task_name in NAME_TO_SCORE_GETTER, f"Invalid task name: {task_name}"
[rank0]: AssertionError: Invalid task name: scbench_kv

I found that the evaluation tasks provided in the compute_scores.py file are as follows, which do not match the test tasks of scbench.

def get_score_one(pred: str, label: str, task_name: str, model_name: str) -> float:
    """
    Computes the score for one prediction.
    Returns one float (zero and one for boolean values).
    """
    NAME_TO_SCORE_GETTER = {
        # Retrieve
        "kv_retrieval": get_score_one_kv_retrieval,
        "kv_retrieval_prefix": get_score_one_kv_retrieval,
        "kv_retrieval_both": get_score_one_kv_retrieval,
        "passkey": get_score_one_passkey,
        "number_string": get_score_one_number_string,
        # Code
        "code_run": get_score_one_code_run,
        "code_debug": get_score_one_code_debug,
        # Longbook
        "longdialogue_qa_eng": get_score_one_longdialogue_qa_eng,
        "longbook_qa_eng": get_score_one_longbook_qa_eng,
        "longbook_sum_eng": get_score_one_longbook_sum_eng,
        "longbook_choice_eng": get_score_one_longbook_choice_eng,
        "longbook_qa_chn": get_score_one_longbook_qa_chn,
        # Math
        "math_find": get_score_one_math_find,
        "math_calc": get_score_one_math_calc,
        # multi-turn nativ
        "multi_turn_summary": get_score_one_longbook_sum_eng,
        "multi_turn_vt": string_match_all,
        "multi_turn_many_shot": get_score_one_longdialogue_qa_eng,
        "multi_turn_kv_compressible": get_score_one_kv_retrieval,
    }
    assert task_name in NAME_TO_SCORE_GETTER, f"Invalid task name: {task_name}"
    score = NAME_TO_SCORE_GETTER[task_name](pred, label, model_name)
    return float(score)

The text was updated successfully, but these errors were encountered:

iofu728 · 2024-12-26T03:24:44Z

Hi @rainstorm12, thank you for pointing out this issue.

We have already fixed it in #101.

Please fetch the updated code and let us know if you encounter any further problems!

git clone https://github.com/microsoft/MInference
pip install -e .

rainstorm12 · 2024-12-27T10:06:54Z

Thank you very much for your help! I solved my problem!
However, when I try the muli-task, I encounter a new problem. My test.sh file is as follows:

python run_scbench.py \
    --task scbench_repoqa_and_kv \
    --model_name_or_path meta-llama/Meta-Llama-3.1-8B-Instruct \
    --data_dir ./data \
    --output_dir ./results \
    --rewrite \
    --attn_type minference \
    --kv_type dense \
    --use_chat_template \
    --trust_remote_code

and the error is as follow:

==== Evaluation scbench_repoqa_and_kv====
# examples: 88
Num eval examples: -1
Verbose: False
Max new tokens: {'scbench_repoqa': 1024, 'scbench_kv': 80}
Num of turns: 5
0it [00:00, ?it/s]# tokens before: 67598
# tokens after: 67598
0it [00:00, ?it/s]
Traceback (most recent call last):
  File "/myfile/MInference/scbench/run_scbench.py", line 397, in <module>
    pred = get_pred(
  File "/myfile/MInference/scbench/run_scbench.py", line 125, in get_pred
    outputs = model.test(
  File "/myfile/MInference/scbench/eval_utils.py", line 1246, in test
    max_length_per_turn = max_length[example["task"][idx]]
KeyError: 'multi_turn_kv'

The key 'multi_turn_kv' may not be present in Max new tokens
when I run the task of scbench_summary_with_needles, the error is as follow, which is similar to the above problem:

==== Evaluation scbench_summary_with_needles====
# examples: 70
Num eval examples: -1
Verbose: False
Max new tokens: {'scbench_summary': 800, 'scbench_passkey': 15}
Num of turns: 5
0it [00:00, ?it/s]# tokens before: 98057
# tokens after: 97962
0it [00:00, ?it/s]
Traceback (most recent call last):
  File "/myfile/MInference/scbench/run_scbench.py", line 397, in <module>
    pred = get_pred(
  File "/myfile/MInference/scbench/run_scbench.py", line 125, in get_pred
    outputs = model.test(
  File "/myfile/MInference/scbench/eval_utils.py", line 1246, in test
    max_length_per_turn = max_length[example["task"][idx]]
KeyError: 'multi_turn_passkey'

iofu728 · 2024-12-30T02:26:04Z

Hi @rainstorm12,

This issue is due to an update in the SCBench HF dataset. You need to add download_mode='force_redownload'.

data = load_dataset("microsoft/SCBench", dataset, split="test", download_mode='force_redownload')

Let me know if it works!

rainstorm12 · 2024-12-30T10:56:20Z

Thank you for your reply.
However, my server's network is not very stable, and downloading data directly from the code would time out. That's why I previously downloaded your dataset from the Hugging Face website.

from datasets import load_dataset
data_name = "scbench_kv"
data = load_dataset("./SCBench", data_name, split="test")

Before you updated the dataset, I was able to load the data with this method. After downloading your dataset this time, I encountered the following error.

      [567]    builder_config = self.builder_configs.get(config_name)
      [568]    if builder_config is None and self.BUILDER_CONFIGS:
  --> [569]       raise ValueError(
      [570]             f"BuilderConfig '{config_name}' not found. Available:{list(self.builder_configs.keys())}"
      [571]       )
      [573] # if not using an existing config, then create a new config on the fly
      [574] if not builder_config:

ValueError: BuilderConfig 'scbench_kv' not found. Available: ['default']

iofu728 · 2024-12-31T06:16:18Z

Hi @rainstorm12,

Thank you for your feedback! However, I didn’t encounter any issues when running your code locally.

In [1]: from datasets import load_dataset
   ...: data_name = "scbench_kv"
   ...: data = load_dataset("./SCBench", data_name, split="test")
Generating test split: 100%|███████████████████████████████████████| 100/100 [00:00<00:00, 1545.74 examples/s]

Please check the following:

Ensure git lfs is enabled and the git clone process was completed successfully.
Verify your datasets version. I’m using version 3.2.0.

➜  ~ ll SCBench/*
-rw-r--r-- 1 aiscuser aiscuser  14K Dec 30 22:11 SCBench/README.md

SCBench/data:
total 649M
-rw-r--r-- 1 aiscuser aiscuser 178K Dec 30 22:11 comparison.png
-rw-r--r-- 1 aiscuser aiscuser 337K Dec 30 22:11 framework.png
-rw-r--r-- 1 aiscuser aiscuser 299K Dec 30 22:11 overview.png
-rw-r--r-- 1 aiscuser aiscuser 6.9K Dec 30 22:11 readme.md
-rw-r--r-- 1 aiscuser aiscuser 645K Dec 30 22:11 results.png
-rw-r--r-- 1 aiscuser aiscuser  46M Dec 30 22:11 scbench_choice_eng.jsonl
-rw-r--r-- 1 aiscuser aiscuser  21M Dec 30 22:11 scbench_kv.jsonl
-rw-r--r-- 1 aiscuser aiscuser 4.7M Dec 30 22:11 scbench_many_shot.jsonl
-rw-r--r-- 1 aiscuser aiscuser  14M Dec 30 22:11 scbench_mf.jsonl
-rw-r--r-- 1 aiscuser aiscuser  17M Dec 30 22:11 scbench_prefix_suffix.jsonl
-rw-r--r-- 1 aiscuser aiscuser 344M Dec 30 22:12 scbench_qa_chn.jsonl
-rw-r--r-- 1 aiscuser aiscuser  57M Dec 30 22:11 scbench_qa_eng.jsonl
-rw-r--r-- 1 aiscuser aiscuser  25M Dec 30 22:11 scbench_repoqa_and_kv.jsonl
-rw-r--r-- 1 aiscuser aiscuser  25M Dec 30 22:11 scbench_repoqa.jsonl
-rw-r--r-- 1 aiscuser aiscuser  28M Dec 30 22:11 scbench_summary.jsonl
-rw-r--r-- 1 aiscuser aiscuser  28M Dec 30 22:11 scbench_summary_with_needles.jsonl
-rw-r--r-- 1 aiscuser aiscuser  42M Dec 30 22:11 scbench_vt.jsonl

SCBench/scbench_choice_eng:
total 28M
-rw-r--r-- 1 aiscuser aiscuser 28M Dec 30 22:11 test-00000-of-00001.parquet

SCBench/scbench_kv:
total 18M
-rw-r--r-- 1 aiscuser aiscuser 18M Dec 30 22:11 test-00000-of-00001.parquet

SCBench/scbench_many_shot:
total 100K
-rw-r--r-- 1 aiscuser aiscuser 98K Dec 30 22:11 test-00000-of-00001.parquet

SCBench/scbench_mf:
total 3.6M
-rw-r--r-- 1 aiscuser aiscuser 3.6M Dec 30 22:11 test-00000-of-00001.parquet

SCBench/scbench_prefix_suffix:
total 16M
-rw-r--r-- 1 aiscuser aiscuser 16M Dec 30 22:11 test-00000-of-00001.parquet

SCBench/scbench_qa_chn:
total 111M
-rw-r--r-- 1 aiscuser aiscuser 111M Dec 30 22:11 test-00000-of-00001.parquet

SCBench/scbench_qa_eng:
total 34M
-rw-r--r-- 1 aiscuser aiscuser 34M Dec 30 22:11 test-00000-of-00001.parquet

SCBench/scbench_repoqa:
total 4.3M
-rw-r--r-- 1 aiscuser aiscuser 4.3M Dec 30 22:11 test-00000-of-00001.parquet

SCBench/scbench_repoqa_and_kv:
total 8.2M
-rw-r--r-- 1 aiscuser aiscuser 8.2M Dec 30 22:11 test-00000-of-00001.parquet

SCBench/scbench_summary:
total 14M
-rw-r--r-- 1 aiscuser aiscuser 14M Dec 30 22:11 test-00000-of-00001.parquet

SCBench/scbench_summary_with_needles:
total 14M
-rw-r--r-- 1 aiscuser aiscuser 14M Dec 30 22:11 test-00000-of-00001.parquet

SCBench/scbench_vt:
total 2.1M
-rw-r--r-- 1 aiscuser aiscuser 2.1M Dec 30 22:11 test-00000-of-00001.parquet

rainstorm12 added the bug Something isn't working label Dec 26, 2024

iofu728 self-assigned this Dec 26, 2024

iofu728 added question Further information is requested and removed bug Something isn't working labels Dec 26, 2024

iofu728 changed the title ~~[Bug]: The evaluation code of scbench does not match the provided dataset.~~ [Question]: The evaluation code of scbench does not match the provided dataset. Dec 26, 2024

iofu728 mentioned this issue Dec 26, 2024

Fix(SCBench): fix the pipeline and load dataset #101

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]: The evaluation code of scbench does not match the provided dataset. #103

[Question]: The evaluation code of scbench does not match the provided dataset. #103

rainstorm12 commented Dec 26, 2024 •

edited

Loading

iofu728 commented Dec 26, 2024

rainstorm12 commented Dec 27, 2024

iofu728 commented Dec 30, 2024

rainstorm12 commented Dec 30, 2024 •

edited

Loading

iofu728 commented Dec 31, 2024

[Question]: The evaluation code of scbench does not match the provided dataset. #103

[Question]: The evaluation code of scbench does not match the provided dataset. #103

Comments

rainstorm12 commented Dec 26, 2024 • edited Loading

Describe the bug

iofu728 commented Dec 26, 2024

rainstorm12 commented Dec 27, 2024

iofu728 commented Dec 30, 2024

rainstorm12 commented Dec 30, 2024 • edited Loading

iofu728 commented Dec 31, 2024

rainstorm12 commented Dec 26, 2024 •

edited

Loading

rainstorm12 commented Dec 30, 2024 •

edited

Loading