-
Notifications
You must be signed in to change notification settings - Fork 541
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Docs][Serve] Speed up weights loading by AMI and Docker Image #3073
base: master
Are you sure you want to change the base?
Conversation
docs/source/serving/sky-serve.rst
Outdated
Speedup Weights Loading in Large Model Serving | ||
---------------------------------------------- | ||
|
||
When serving large models, the weights of the model are loaded from the cloud storage / public internet to the VMs. This process can take a long time, especially for large models. To speed up the weights loading, you can use the following methods: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems the benefit of using machine image or docker image is more for reducing the setup time, instead of the the model weight downloading time, as they are mostly for packaging the dependencies and not neccessarily means it will speed up the download of the image or the model, which should be limited by the network bandwidth instead.
Should we rewrite the section as reducing the overhead of environment setup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! Let me rephrase that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess there are some speedups if using a machine image? That should be optimized by the cloud provider which makes it faster than a plain network download?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Machine images are intended to be used in a single region. It can be used in another region for launching a VM, but there will involve data transfer from one region to another. Cloud provider should have optimized it, but we probably want to be careful about the wording to avoid having a impression that all the benefits comes from weight loading.
docs/source/serving/sky-serve.rst
Outdated
|
||
# Here goes the setup and run commands... | ||
|
||
This is easier to configure than machine images, but it may have a longer startup time than machine images since it needs to pull the docker image from the registry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we actually timed these two methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm feeling like it is hard to make a fair comparison - it is largely dependent on the base docker/machine image used... Though I'll try to make some benchmarks and see the results 🫡
Co-authored-by: Zongheng Yang <[email protected]>
docs/source/serving/sky-serve.rst
Outdated
image_id: docker:docker-image-with-dependency-installed | ||
|
||
# Followed by setup and run commands. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could mention something about how this docker image should be built, especially, it could have SkyPilot runtime pre-built. Something like the following would be useful (could you help giving an concrete example for how to install vllm, and download the image in the Dockerfile below for a better reference, i.e. replacing the line # Your dependencies installation and model download code goes here
with actual workable commands for serving vllm+mistral):
Your docker image can have all skypilot dependencies pre-installed to further reduce the setup time, you could try building your docker image based from our base image. The `Dockerfile` could look like the following:
```Dockerfile
FROM docker:berkeleyskypilot/skypilot-k8s-gpu:latest
# Your dependencies installation and model download code goes here
```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this doc @cblmemo! We have users asking for this and it would be nice we can directly point them to this page. : )
Just want to update the status of this PR first: I found a mysterious bug that causes an NVML initialization error when using docker container as runtime env. By bisect it seems like those lines are causing the error: skypilot/sky/provision/docker_utils.py Lines 271 to 274 in 82c50f5
That is very strange since those are running on the host but somehow affect the containers. Will investigate more. |
…kypilot-org/skypilot into serve-docs-speedup-weights-downloading
…weights-downloading
We should consider having this PR updated and merged as well. : ) |
This PR is blocked by the max/ultra disk tier as the current performance is not better than install everything from pip... |
Another user requests this. : ) |
Left some benchmark results using PR #3860. In the following table,
In conclusion, our |
We should revamp this PR with our latest findings and support for ultra disk : ) |
Done in #3949 . Still keeping this so we could investigate if it is possible to speed up by using AMI & docker image. |
TODO: Benchmark and get some numbers
Fix bug using
Tested (run the relevant ones):
bash format.sh
sky launch
the following yaml and works properlypytest tests/test_smoke.py
pytest tests/test_smoke.py::test_fill_in_the_name
bash tests/backward_comaptibility_tests.sh