Skip to content

Commit

Permalink
Update article with read more
Browse files Browse the repository at this point in the history
  • Loading branch information
vikranth22446 committed Jun 3, 2024
1 parent e4ef28d commit 9b27d45
Show file tree
Hide file tree
Showing 3 changed files with 4 additions and 2 deletions.
2 changes: 2 additions & 0 deletions content/posts/preble.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ summary: "
LLM prompts are growing more complex and longer with [agents](https://arxiv.org/abs/2308.11432), [tool use](https://platform.openai.com/docs/guides/function-calling), [large documents](https://arxiv.org/html/2404.07143v1), [video clips](https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/#context-window), and detailed [few-shot examples](https://arxiv.org/pdf/2210.03629). These prompts often have content that is shared across many requests. The computed intermediate state (KV cache) from one prompt can be reused by another for their shared parts to improve request handling performance and save GPU computation resources. However, current distributed LLM serving systems treat each request as independent and miss the opportunity to reuse the computed intermediate state.
We introduce [**Preble**](https://github.com/WukLab/preble), the first distributed LLM serving system that targets long and shared prompts. Preble achieves a **1.5-14.5x** average and **2-10x** p99 latency reduction over SOTA serving systems. The core of Preble is a new E2 Scheduling that optimizes load distribution and KV cache reutilization. Preble is compatible with multiple serving backends such as [vLLM](https://github.com/vllm-project/vllm) and [SGLang](https://github.com/sgl-project/sglang).
<br/><br/>
[Read More...](https://mlsys.wuklab.io/posts/preble/)
"
---
Author: Vikranth Srivatsa and Yiying Zhang
Expand Down
4 changes: 2 additions & 2 deletions layouts/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,11 @@ <h2>News</h2>
{{ .Content }}
{{ else }}
{{ .Summary }}
{{ if .Truncated }}
<!-- {{ if .Truncated }} -->
<div class="read-more-link">
<a href="{{ .RelPermalink }}">Read More…</a>
</div>
{{ end }}
<!-- {{ end }} -->
{{ end }}
</article>
{{- end }}
Expand Down
Binary file modified static/images/preble_gifs/existing_system_processing.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 9b27d45

Please sign in to comment.