Skip to content

[BugFix] Fix a problem of failing to utilize custom kv cache dtype. #944

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

whx-sjtu
Copy link
Contributor

@whx-sjtu whx-sjtu commented May 24, 2025

This PR fixes bugs in FA3 quantization situation.
There are three problems in total:

  1. Not use self.kv_cache_dtype to initialize kv_cache_spec in model_runner_v1.py, fixed.
  2. In v0 situation, if we pass "kv_cache_dtype" to initialize vllm, program will collapse due to a validation in vllm's config.py (https://github.com/vllm-project/vllm/blob/c1e4a4052d65d72d45e39db1edb6b7deb4ffd426/vllm/config.py#L1496)
  3. In v1 situation, currenly "kv_cache_dtype" can't be passed as described in issue 17355

To solve problem 2 and 3, we currently pass kv_cache_dtype through additional_config and set the cache dtype of cache_config in platform's check_and_update_config
cc @wangxiyuan

Signed-off-by: whx-sjtu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant