[BugFix] Fix a problem of failing to utilize custom kv cache dtype. #944

whx-sjtu · 2025-05-24T09:51:15Z

This PR fixes bugs in FA3 quantization situation.
There are three problems in total:

Not use self.kv_cache_dtype to initialize kv_cache_spec in model_runner_v1.py, fixed.
In v0 situation, if we pass "kv_cache_dtype" to initialize vllm, program will collapse due to a validation in vllm's config.py (https://github.com/vllm-project/vllm/blob/c1e4a4052d65d72d45e39db1edb6b7deb4ffd426/vllm/config.py#L1496)
In v1 situation, currenly "kv_cache_dtype" can't be passed as described in issue 17355

To solve problem 2 and 3, we currently pass kv_cache_dtype through additional_config and set the cache dtype of cache_config in platform's check_and_update_config
cc @wangxiyuan

Signed-off-by: whx-sjtu <[email protected]>

bug fix of fa3 quant

04d5ea6

Signed-off-by: whx-sjtu <[email protected]>

github-actions bot added the module:core label May 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BugFix] Fix a problem of failing to utilize custom kv cache dtype. #944

[BugFix] Fix a problem of failing to utilize custom kv cache dtype. #944

Uh oh!

whx-sjtu commented May 24, 2025 •

edited

Loading

Uh oh!

Uh oh!

[BugFix] Fix a problem of failing to utilize custom kv cache dtype. #944

Are you sure you want to change the base?

[BugFix] Fix a problem of failing to utilize custom kv cache dtype. #944

Uh oh!

Conversation

whx-sjtu commented May 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

whx-sjtu commented May 24, 2025 •

edited

Loading