Evaluate: Gemma 2 #56

ggbetz · 2024-06-30T18:48:04Z

Check upon issue creation:

The model has not been evaluated yet and doesn't show up on the CoT Leaderboard.
There is no evaluation request issue for the model in the repo.
The parameters below have been adapted and shall be used.

Parameters with XXX in [9b, 27b]:

NEXT_MODEL_PATH=google/gemma-2-XXX-it
NEXT_MODEL_REVISION=main
NEXT_MODEL_PRECISION=bfloat16
MAX_LENGTH=2048 
GPU_MEMORY_UTILIZATION=0.7
VLLM_SWAP_SPACE=4

ToDos:

Run cot-eval pipeline
Merge pull requests for cot-eval results datats (> @ggbetz)
Create eval request record to update metadata on leaderboard (> @ggbetz)

ggbetz · 2024-08-01T06:37:55Z

I got

Please use Flashinfer backend for models withlogits_soft_cap (i.e., Gemma-2).
Otherwise, the output might be wrong. Set Flashinfer backend by export 
VLLM_ATTENTION_BACKEND=FLASHINFER. (type=value_error)

ggbetz · 2024-08-01T06:43:17Z

We might consider to re-run the evals for Gemma 1.

ggbetz · 2024-08-01T08:27:46Z

I've added flashinfer to our docker contaner, but still get an error when trying to run and evaluate gemma2:

INFO 08-01 10:22:08 selector.py:79] Using Flashinfer backend.
WARNING 08-01 10:22:08 selector.py:80] Flashinfer will be stuck on llama-2-7b, please avoid using Flashinfer as the backend when running on llama-2-7b.
INFO 08-01 10:22:08 weight_utils.py:218] Using model weights format ['*.safetensors']
INFO 08-01 10:23:11 model_runner.py:255] Loading model weights took 4.9975 GB
[rank0]: Traceback (most recent call last):
[rank0]:   File "/usr/local/bin/cot-eval", line 8, in <module>
[rank0]:     sys.exit(main())
[rank0]:   File "/workspace/cot-eval/src/cot_eval/__main__.py", line 149, in main
[rank0]:     llm = VLLM(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/pydantic/v1/main.py", line 339, in __init__
[rank0]:     values, fields_set, validation_error = validate_model(__pydantic_self__.__class__, data)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/pydantic/v1/main.py", line 1050, in validate_model
[rank0]:     input_data = validator(cls_, input_data)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/langchain_core/utils/pydantic.py", line 146, in wrapper
[rank0]:     return func(cls, values)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/langchain_community/llms/vllm.py", line 89, in validate_environment
[rank0]:     values["client"] = VLLModel(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/llm.py", line 149, in __init__
[rank0]:     self.llm_engine = LLMEngine.from_engine_args(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 414, in from_engine_args
[rank0]:     engine = cls(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 256, in __init__
[rank0]:     self._initialize_kv_caches()
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 353, in _initialize_kv_caches
[rank0]:     self.model_executor.determine_num_available_blocks())
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/executor/gpu_executor.py", line 76, in determine_num_available_blocks
[rank0]:     return self.driver_worker.determine_num_available_blocks()
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 173, in determine_num_available_blocks
[rank0]:     self.model_runner.profile_run()
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 874, in profile_run
[rank0]:     self.execute_model(model_input, kv_caches, intermediate_tensors)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 1221, in execute_model
[rank0]:     model_input.attn_metadata.begin_forward()
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/attention/backends/flashinfer.py", line 132, in begin_forward
[rank0]:     self.prefill_wrapper.begin_forward(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/flashinfer/prefill.py", line 791, in begin_forward
[rank0]:     self._wrapper.begin_forward(
[rank0]: RuntimeError: CHECK_EQ(paged_kv_indptr.size(0), batch_size + 1) failed. 1 vs 257

ggbetz · 2024-08-01T13:40:46Z

We'll probably have to wait for the next vllm release. See:

flashinfer-ai/flashinfer#362
vllm-project/vllm#7008

ggbetz added the eval_request label Jun 30, 2024

ggbetz added the bug Something isn't working label Aug 1, 2024

ggbetz closed this as completed Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate: Gemma 2 #56

Evaluate: Gemma 2 #56

ggbetz commented Jun 30, 2024

ggbetz commented Aug 1, 2024 •

edited

Loading

ggbetz commented Aug 1, 2024

ggbetz commented Aug 1, 2024

ggbetz commented Aug 1, 2024

Evaluate: Gemma 2 #56

Evaluate: Gemma 2 #56

Comments

ggbetz commented Jun 30, 2024

ggbetz commented Aug 1, 2024 • edited Loading

ggbetz commented Aug 1, 2024

ggbetz commented Aug 1, 2024

ggbetz commented Aug 1, 2024

ggbetz commented Aug 1, 2024 •

edited

Loading