Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate: Gemma 2 #56

Closed
3 of 6 tasks
ggbetz opened this issue Jun 30, 2024 · 4 comments
Closed
3 of 6 tasks

Evaluate: Gemma 2 #56

ggbetz opened this issue Jun 30, 2024 · 4 comments
Labels
bug Something isn't working eval_request

Comments

@ggbetz
Copy link
Contributor

ggbetz commented Jun 30, 2024

Check upon issue creation:

  • The model has not been evaluated yet and doesn't show up on the CoT Leaderboard.
  • There is no evaluation request issue for the model in the repo.
  • The parameters below have been adapted and shall be used.

Parameters with XXX in [9b, 27b]:

NEXT_MODEL_PATH=google/gemma-2-XXX-it
NEXT_MODEL_REVISION=main
NEXT_MODEL_PRECISION=bfloat16
MAX_LENGTH=2048 
GPU_MEMORY_UTILIZATION=0.7
VLLM_SWAP_SPACE=4

ToDos:

  • Run cot-eval pipeline
  • Merge pull requests for cot-eval results datats (> @ggbetz)
  • Create eval request record to update metadata on leaderboard (> @ggbetz)
@ggbetz
Copy link
Contributor Author

ggbetz commented Aug 1, 2024

I got

Please use Flashinfer backend for models withlogits_soft_cap (i.e., Gemma-2).
Otherwise, the output might be wrong. Set Flashinfer backend by export 
VLLM_ATTENTION_BACKEND=FLASHINFER. (type=value_error)

@ggbetz
Copy link
Contributor Author

ggbetz commented Aug 1, 2024

We might consider to re-run the evals for Gemma 1.

@ggbetz
Copy link
Contributor Author

ggbetz commented Aug 1, 2024

I've added flashinfer to our docker contaner, but still get an error when trying to run and evaluate gemma2:

INFO 08-01 10:22:08 selector.py:79] Using Flashinfer backend.
WARNING 08-01 10:22:08 selector.py:80] Flashinfer will be stuck on llama-2-7b, please avoid using Flashinfer as the backend when running on llama-2-7b.
INFO 08-01 10:22:08 weight_utils.py:218] Using model weights format ['*.safetensors']
INFO 08-01 10:23:11 model_runner.py:255] Loading model weights took 4.9975 GB
[rank0]: Traceback (most recent call last):
[rank0]:   File "/usr/local/bin/cot-eval", line 8, in <module>
[rank0]:     sys.exit(main())
[rank0]:   File "/workspace/cot-eval/src/cot_eval/__main__.py", line 149, in main
[rank0]:     llm = VLLM(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/pydantic/v1/main.py", line 339, in __init__
[rank0]:     values, fields_set, validation_error = validate_model(__pydantic_self__.__class__, data)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/pydantic/v1/main.py", line 1050, in validate_model
[rank0]:     input_data = validator(cls_, input_data)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/langchain_core/utils/pydantic.py", line 146, in wrapper
[rank0]:     return func(cls, values)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/langchain_community/llms/vllm.py", line 89, in validate_environment
[rank0]:     values["client"] = VLLModel(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/llm.py", line 149, in __init__
[rank0]:     self.llm_engine = LLMEngine.from_engine_args(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 414, in from_engine_args
[rank0]:     engine = cls(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 256, in __init__
[rank0]:     self._initialize_kv_caches()
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 353, in _initialize_kv_caches
[rank0]:     self.model_executor.determine_num_available_blocks())
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/executor/gpu_executor.py", line 76, in determine_num_available_blocks
[rank0]:     return self.driver_worker.determine_num_available_blocks()
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 173, in determine_num_available_blocks
[rank0]:     self.model_runner.profile_run()
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 874, in profile_run
[rank0]:     self.execute_model(model_input, kv_caches, intermediate_tensors)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 1221, in execute_model
[rank0]:     model_input.attn_metadata.begin_forward()
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/attention/backends/flashinfer.py", line 132, in begin_forward
[rank0]:     self.prefill_wrapper.begin_forward(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/flashinfer/prefill.py", line 791, in begin_forward
[rank0]:     self._wrapper.begin_forward(
[rank0]: RuntimeError: CHECK_EQ(paged_kv_indptr.size(0), batch_size + 1) failed. 1 vs 257

@ggbetz ggbetz added the bug Something isn't working label Aug 1, 2024
@ggbetz
Copy link
Contributor Author

ggbetz commented Aug 1, 2024

We'll probably have to wait for the next vllm release. See:

flashinfer-ai/flashinfer#362
vllm-project/vllm#7008

@ggbetz ggbetz closed this as completed Oct 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working eval_request
Projects
None yet
Development

No branches or pull requests

1 participant