[Refactor] BigBench #988

orendar · 2023-11-14T22:08:15Z

Hey, I saw that the current implementation of BigBench with its difficult dependencies is a placeholder until Huggingface Dataset is ready. I am a fan of BigBench and would love to utilize it within big-refactor branch.

Is there any additional work that I or the community can do to finish the HF-based integration? Or should it be ready already based on the state of the dataset?

Thanks!

haileyschoelkopf · 2023-11-17T15:10:43Z

Hi! I will have the mirrored dataset completed uploading shortly, I apologize for the delay--had to work around rate limits for uploading to HF!

orendar · 2023-11-17T15:21:56Z

Thank you so much for all your work!! I really appreciate it, closing :)

haileyschoelkopf · 2023-11-17T16:38:55Z

Everything should be uploaded as of #1002 !

orendar · 2023-11-18T11:08:38Z

@haileyschoelkopf Hey sorry to bother you again, but I'm having trouble running bigbench_*_multiple_choice (including individual tasks, with/without few-shot etc). Do you know what might be the issue here?
I get a similar stacktrace for every subtask and configuration I've tried:

Traceback (most recent call last):
  File "/home/ec2-user/anaconda3/envs/eval/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/ec2-user/anaconda3/envs/eval/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/ec2-user/SageMaker/generative/lm-evaluation-harness/lm_eval/__main__.py", line 248, in <module>
    cli_evaluate()
  File "/home/ec2-user/SageMaker/generative/lm-evaluation-harness/lm_eval/__main__.py", line 199, in cli_evaluate
    results = evaluator.simple_evaluate(
  File "/home/ec2-user/SageMaker/generative/lm-evaluation-harness/lm_eval/utils.py", line 356, in _wrapper
    return fn(*args, **kwargs)
  File "/home/ec2-user/SageMaker/generative/lm-evaluation-harness/lm_eval/evaluator.py", line 111, in simple_evaluate
    task_dict = lm_eval.tasks.get_task_dict(tasks)
  File "/home/ec2-user/SageMaker/generative/lm-evaluation-harness/lm_eval/tasks/__init__.py", line 250, in get_task_dict
    task_name: get_task(task_name=task_element, config=config),
  File "/home/ec2-user/SageMaker/generative/lm-evaluation-harness/lm_eval/tasks/__init__.py", line 192, in get_task
    return TASK_REGISTRY[task_name](config=config)
  File "/home/ec2-user/SageMaker/generative/lm-evaluation-harness/lm_eval/api/task.py", line 682, in __init__
    test_target = self.doc_to_target(test_doc)
  File "/home/ec2-user/SageMaker/generative/lm-evaluation-harness/lm_eval/api/task.py", line 899, in doc_to_target
    target_string = utils.apply_template(doc_to_target, doc)
  File "/home/ec2-user/SageMaker/generative/lm-evaluation-harness/lm_eval/utils.py", line 489, in apply_template
    return rtemplate.render(**doc)
  File "/home/ec2-user/anaconda3/envs/eval/lib/python3.8/site-packages/jinja2/environment.py", line 1301, in render
    self.environment.handle_exception()
  File "/home/ec2-user/anaconda3/envs/eval/lib/python3.8/site-packages/jinja2/environment.py", line 936, in handle_exception
    raise rewrite_traceback_stack(source=source)
  File "<template>", line 1, in top-level template code
ValueError: 'thought' is not in list

haileyschoelkopf · 2023-11-19T02:16:48Z

Hey! Is there a specific subtask for which this occurs, so I can test it?

orendar · 2023-11-19T08:25:05Z

Yes, I just verified and this specific stack trace comes from "bigbench_ascii_word_recognition_multiple_choice". Thank you!

lintangsutawika · 2024-01-15T14:55:22Z

@orendar @haileyschoelkopf is this fixed?

bryanSwk · 2024-02-06T09:27:16Z

Sorry for bumping, but i noticed that bigbench_*_multiple_choice breaks for certain subsets with an empty "multiple_choice_targets" column. i.e. (https://huggingface.co/datasets/hails/bigbench/viewer/tense_zero_shot).

Got a similar error as above:

    raise rewrite_traceback_stack(source=source)
  File "<template>", line 1, in top-level template code
ValueError: 'She has applied for the job.' is not in list

nanyyyyyy · 2024-02-07T07:58:34Z

Got the same error here when running bigbench_multiple_choice

orendar closed this as completed Nov 17, 2023

orendar reopened this Nov 18, 2023

lintangsutawika added the bug Something isn't working. label Dec 14, 2023

lintangsutawika added this to LM-Eval Support and Development Dec 14, 2023

lintangsutawika moved this to Backlog in LM-Eval Support and Development Dec 14, 2023

haileyschoelkopf moved this from Backlog to Ready in LM-Eval Support and Development Dec 23, 2023

lintangsutawika linked a pull request Apr 9, 2024 that will close this issue

Bigbench fix #1686

Merged

lintangsutawika mentioned this issue Apr 9, 2024

Bigbench fix #1686

Merged

lintangsutawika self-assigned this Apr 9, 2024

haileyschoelkopf closed this as completed in #1686 May 24, 2024

github-project-automation bot moved this from Ready to Done in LM-Eval Support and Development May 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Refactor] BigBench #988

[Refactor] BigBench #988

orendar commented Nov 14, 2023

haileyschoelkopf commented Nov 17, 2023

orendar commented Nov 17, 2023

haileyschoelkopf commented Nov 17, 2023

orendar commented Nov 18, 2023

haileyschoelkopf commented Nov 19, 2023

orendar commented Nov 19, 2023

lintangsutawika commented Jan 15, 2024

bryanSwk commented Feb 6, 2024

nanyyyyyy commented Feb 7, 2024

[Refactor] BigBench #988

[Refactor] BigBench #988

Comments

orendar commented Nov 14, 2023

haileyschoelkopf commented Nov 17, 2023

orendar commented Nov 17, 2023

haileyschoelkopf commented Nov 17, 2023

orendar commented Nov 18, 2023

haileyschoelkopf commented Nov 19, 2023

orendar commented Nov 19, 2023

lintangsutawika commented Jan 15, 2024

bryanSwk commented Feb 6, 2024

nanyyyyyy commented Feb 7, 2024