Skip to content

Commit

Permalink
fix typo and inst
Browse files Browse the repository at this point in the history
  • Loading branch information
yzh119 committed Feb 6, 2024
1 parent e80f04d commit 6cf8629
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 2 deletions.
2 changes: 1 addition & 1 deletion _posts/2024-01-03-introduce-flashinfer.md
Original file line number Diff line number Diff line change
Expand Up @@ -186,7 +186,7 @@ Figure 10: Fused RoPE attention performance, use Llama2-7B setting: um_kv_heads=
</p>

RoPE has negligible overhead on all 4 GPUs, especially for RTX 6000 Ada and RTX 4090 GPU which has
strong CUDA Cores performance (RoPE requires `sin`/`cos` computation that can only be accelerated with Tensor Cores).
strong CUDA Cores performance (RoPE requires `sin`/`cos` computation that can not be accelerated with Tensor Cores).

### Low-Precision Attention

Expand Down
2 changes: 1 addition & 1 deletion _posts/2024-01-08-cascade-inference.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ layout: post
title: "Cascade Inference: Memory Bandwidth Efficient Shared Prefix Batch Decoding"
date: 2024-02-02
comments: true
author: Zihao Ye (UW), Ruihang Lai (CMU), Bo-Ru Lu (UW), Chien-Yu Lin (UW), Size Zheng (UW & PKU), Lequn Chen (UW), Tianqi Chen (CMU & OctoML), Luis Ceze (UW & OctoML)
author: Zihao Ye (UW), Ruihang Lai (CMU), Bo-Ru Lu (UW), Chien-Yu Lin (UW), Size Zheng (UW & PKU), Lequn Chen (UW), Tianqi Chen (CMU & OctoAI), Luis Ceze (UW & OctoAI)
redirect_from: "/2024/01/08/cascade-inference"
---

Expand Down

0 comments on commit 6cf8629

Please sign in to comment.