Skip to content

Commit

Permalink
Don't calculate KV scales dynamically if Q scale is included
Browse files Browse the repository at this point in the history
  • Loading branch information
mawong-amd committed Dec 19, 2024
1 parent 06f53ba commit 0bd414a
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions vllm/model_executor/layers/quantization/kv_cache.py
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,7 @@ def process_weights_after_loading(self, layer: torch.nn.Module) -> None:
q_scale = layer.q_scale.to("cpu").tolist()
if current_platform.is_rocm() and not is_navi():
q_scale *= 2
layer.calculate_kv_scales = False
else:
q_scale = 1.0
if layer.prob_scale > 0.0:
Expand Down

0 comments on commit 0bd414a

Please sign in to comment.