Update introduction.md

WukLab · May 2, 2024 · 7a49675 · 7a49675
1 parent 70ffde5
commit 7a49675
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/content/posts/introduction.md b/content/posts/introduction.md
@@ -12,4 +12,4 @@ Second, an LLM can often be called multiple times in a sequence to carry out a c
 However, today's LLM serving systems are designed for standalone LLMs. They treat any interceptions to an LLM as starting a new request, causing unnecessary recomputation of already computed contexts.
 This recomputation accounts for 37-40% of total model forwarding time in typical settings.
 
-We built APIServe, the first LLM serving framework targeting augmented LLMs. APIServe minimizes the GPU resource waste caused by any interception to an LLM and dedicates saved memory for serving more requests. APIServe improves the overall serving throughput by 1.6x and completes 2x more requests per second compared to the state-of-the-art LLM serving systems.
+We built APIServe (ICML '24), the first LLM serving framework targeting augmented LLMs. APIServe minimizes the GPU resource waste caused by any interception to an LLM and dedicates saved memory for serving more requests. APIServe improves the overall serving throughput by 1.6x and completes 2x more requests per second compared to the state-of-the-art LLM serving systems.