-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question]: The evaluation code of scbench does not match the provided dataset. #103
Comments
Hi @rainstorm12, thank you for pointing out this issue. We have already fixed it in #101. Please fetch the updated code and let us know if you encounter any further problems! git clone https://github.com/microsoft/MInference
pip install -e . |
Thank you very much for your help! I solved my problem! python run_scbench.py \
--task scbench_repoqa_and_kv \
--model_name_or_path meta-llama/Meta-Llama-3.1-8B-Instruct \
--data_dir ./data \
--output_dir ./results \
--rewrite \
--attn_type minference \
--kv_type dense \
--use_chat_template \
--trust_remote_code and the error is as follow:
The key 'multi_turn_kv' may not be present in
|
Hi @rainstorm12, This issue is due to an update in the SCBench HF dataset. You need to add data = load_dataset("microsoft/SCBench", dataset, split="test", download_mode='force_redownload') Let me know if it works! |
Thank you for your reply.
Before you updated the dataset, I was able to load the data with this method. After downloading your dataset this time, I encountered the following error.
|
Hi @rainstorm12, Thank you for your feedback! However, I didn’t encounter any issues when running your code locally. In [1]: from datasets import load_dataset
...: data_name = "scbench_kv"
...: data = load_dataset("./SCBench", data_name, split="test")
Generating test split: 100%|███████████████████████████████████████| 100/100 [00:00<00:00, 1545.74 examples/s] Please check the following:
➜ ~ ll SCBench/*
-rw-r--r-- 1 aiscuser aiscuser 14K Dec 30 22:11 SCBench/README.md
SCBench/data:
total 649M
-rw-r--r-- 1 aiscuser aiscuser 178K Dec 30 22:11 comparison.png
-rw-r--r-- 1 aiscuser aiscuser 337K Dec 30 22:11 framework.png
-rw-r--r-- 1 aiscuser aiscuser 299K Dec 30 22:11 overview.png
-rw-r--r-- 1 aiscuser aiscuser 6.9K Dec 30 22:11 readme.md
-rw-r--r-- 1 aiscuser aiscuser 645K Dec 30 22:11 results.png
-rw-r--r-- 1 aiscuser aiscuser 46M Dec 30 22:11 scbench_choice_eng.jsonl
-rw-r--r-- 1 aiscuser aiscuser 21M Dec 30 22:11 scbench_kv.jsonl
-rw-r--r-- 1 aiscuser aiscuser 4.7M Dec 30 22:11 scbench_many_shot.jsonl
-rw-r--r-- 1 aiscuser aiscuser 14M Dec 30 22:11 scbench_mf.jsonl
-rw-r--r-- 1 aiscuser aiscuser 17M Dec 30 22:11 scbench_prefix_suffix.jsonl
-rw-r--r-- 1 aiscuser aiscuser 344M Dec 30 22:12 scbench_qa_chn.jsonl
-rw-r--r-- 1 aiscuser aiscuser 57M Dec 30 22:11 scbench_qa_eng.jsonl
-rw-r--r-- 1 aiscuser aiscuser 25M Dec 30 22:11 scbench_repoqa_and_kv.jsonl
-rw-r--r-- 1 aiscuser aiscuser 25M Dec 30 22:11 scbench_repoqa.jsonl
-rw-r--r-- 1 aiscuser aiscuser 28M Dec 30 22:11 scbench_summary.jsonl
-rw-r--r-- 1 aiscuser aiscuser 28M Dec 30 22:11 scbench_summary_with_needles.jsonl
-rw-r--r-- 1 aiscuser aiscuser 42M Dec 30 22:11 scbench_vt.jsonl
SCBench/scbench_choice_eng:
total 28M
-rw-r--r-- 1 aiscuser aiscuser 28M Dec 30 22:11 test-00000-of-00001.parquet
SCBench/scbench_kv:
total 18M
-rw-r--r-- 1 aiscuser aiscuser 18M Dec 30 22:11 test-00000-of-00001.parquet
SCBench/scbench_many_shot:
total 100K
-rw-r--r-- 1 aiscuser aiscuser 98K Dec 30 22:11 test-00000-of-00001.parquet
SCBench/scbench_mf:
total 3.6M
-rw-r--r-- 1 aiscuser aiscuser 3.6M Dec 30 22:11 test-00000-of-00001.parquet
SCBench/scbench_prefix_suffix:
total 16M
-rw-r--r-- 1 aiscuser aiscuser 16M Dec 30 22:11 test-00000-of-00001.parquet
SCBench/scbench_qa_chn:
total 111M
-rw-r--r-- 1 aiscuser aiscuser 111M Dec 30 22:11 test-00000-of-00001.parquet
SCBench/scbench_qa_eng:
total 34M
-rw-r--r-- 1 aiscuser aiscuser 34M Dec 30 22:11 test-00000-of-00001.parquet
SCBench/scbench_repoqa:
total 4.3M
-rw-r--r-- 1 aiscuser aiscuser 4.3M Dec 30 22:11 test-00000-of-00001.parquet
SCBench/scbench_repoqa_and_kv:
total 8.2M
-rw-r--r-- 1 aiscuser aiscuser 8.2M Dec 30 22:11 test-00000-of-00001.parquet
SCBench/scbench_summary:
total 14M
-rw-r--r-- 1 aiscuser aiscuser 14M Dec 30 22:11 test-00000-of-00001.parquet
SCBench/scbench_summary_with_needles:
total 14M
-rw-r--r-- 1 aiscuser aiscuser 14M Dec 30 22:11 test-00000-of-00001.parquet
SCBench/scbench_vt:
total 2.1M
-rw-r--r-- 1 aiscuser aiscuser 2.1M Dec 30 22:11 test-00000-of-00001.parquet |
Describe the bug
When I tested the "scbench_kv" task provided by scbench, I encountered the following problems in the compute_scores.py file during the evaluation.
I found that the evaluation tasks provided in the compute_scores.py file are as follows, which do not match the test tasks of scbench.
The text was updated successfully, but these errors were encountered: