Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Docs][Serve] Speed up weights loading by using ultra disk tier #3949

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions docs/source/serving/fast-replica-startup.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
Speeding Up Replica Setup
=========================

When serving AI models, the setup process like dependencies installation and model weights downloading may take a lot of time. To speed up this process, you can use the :code:`ultra` disk tier:

.. code-block:: yaml
:emphasize-lines: 7

service:
replicas: 2
readiness_probe: /v1/models
resources:
ports: 8080
accelerators: A10G:8
disk_tier: ultra

We find that when loading large models, the performance is sometime limited by the disk speed. By using the `ultra` disk tier, you can significantly reduce the time it takes to set up your replicas, allowing for faster response times and improved overall performance. Here is a comparison of disk tiers and their respective speeds. All tests are running on AWS and the result is the end-to-end execution time for launching a Llama 2 70b endpoint with the latest version of vLLM, on an :code:`A10G:8` instance (:code:`g5.48xlarge`).

.. list-table::
:widths: 10 10
:header-rows: 1

* - Disk Tier
- Speed
* - :code:`ultra`
- 410s
* - :code:`high`
- 524s
1 change: 1 addition & 0 deletions docs/source/serving/user-guides.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,4 @@ Serving User Guides
update
auth
spot-policy
fast-replica-startup
Loading