Skip to content

Commit

Permalink
Add a 1-line docstring to explain why calling context_attention_fwd t…
Browse files Browse the repository at this point in the history
…wice in test_prefix_prefill.py (vllm-project#2553)
  • Loading branch information
JasonZhu1313 authored Jan 22, 2024
1 parent 63e835c commit 7a0b011
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions tests/kernels/test_prefix_prefill.py
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,7 @@ def test_contexted_kv_attention(
v_cache = v_cache.view(-1, block_size, num_heads,
head_size).permute(0, 2, 3, 1).contiguous()

# Warm up the Triton kernel by calling it once before actually measuring generation time
context_attention_fwd(query, k, v, output, k_cache, v_cache, block_table,
b_start_loc, b_seq_len, b_ctx_len, max_input_len)
torch.cuda.synchronize()
Expand Down

0 comments on commit 7a0b011

Please sign in to comment.