Skip to content

Commit

Permalink
Prefix Caching- fix t4 triton error (vllm-project#2517)
Browse files Browse the repository at this point in the history
  • Loading branch information
caoshiyi authored and jimpang committed Feb 20, 2024
1 parent b475897 commit 120b2fd
Showing 1 changed file with 3 additions and 1 deletion.
4 changes: 3 additions & 1 deletion vllm/model_executor/layers/triton_kernel/prefix_prefill.py
Original file line number Diff line number Diff line change
Expand Up @@ -618,7 +618,9 @@ def context_attention_fwd(q,
b_ctx_len,
max_input_len,
alibi_slopes=None):
BLOCK = 128

cap = torch.cuda.get_device_capability()
BLOCK = 128 if cap[0] >= 8 else 64
# shape constraints
Lq, Lk, Lv = q.shape[-1], k.shape[-1], v.shape[-1]
assert Lq == Lk and Lk == Lv
Expand Down

0 comments on commit 120b2fd

Please sign in to comment.