[Docs][Serve] Speed up weights loading by AMI and Docker Image #3073

cblmemo · 2024-02-02T04:52:10Z

TODO: Benchmark and get some numbers
Fix bug using

sudo jq '.["exec-opts"] = ["native.cgroupdriver=cgroupfs"]' /etc/docker/daemon.json > /tmp/daemon.json && sudo mv /tmp/daemon.json /etc/docker/daemon.json
sudo systemctl restart docker

Tested (run the relevant ones):

Code formatting: bash format.sh
Any manual or new tests for this PR (please specify below)
- sky launch the following yaml and works properly

# mixtral.yaml
service:
  readiness_probe:
    path: /v1/chat/completions
    post_data:
      model: mistralai/Mixtral-8x7B-Instruct-v0.1
      messages:
        - role: user
          content: Hello! What is your name?
      max_tokens: 1
    initial_delay_seconds: 1800
  replicas: 1

# Fields below describe each replica.
resources:
  cloud: gcp
  image_id: docker:cblmemo/mixtral-vllm:latest
  ports: 8080
  accelerators: {L4:8, A10g:8, A100:4, A100:8, A100-80GB:2, A100-80GB:4, A100-80GB:8}

setup: |
  conda activate vllm
  if [ $? -ne 0 ]; then
    conda create -n vllm python=3.10 -y
    conda activate vllm
  fi
  pip install vllm==0.3.0 transformers==4.37.2

run: |
  conda activate vllm
  export PATH=$PATH:/sbin
  python -m vllm.entrypoints.openai.api_server \
    --tensor-parallel-size $SKYPILOT_NUM_GPUS_PER_NODE \
    --host 0.0.0.0 --port 8080 \
    --model mistralai/Mixtral-8x7B-Instruct-v0.1

All smoke tests: pytest tests/test_smoke.py
Relevant individual smoke tests: pytest tests/test_smoke.py::test_fill_in_the_name
Backward compatibility tests: bash tests/backward_comaptibility_tests.sh

docs/source/serving/sky-serve.rst

Michaelvll · 2024-02-02T07:41:07Z

docs/source/serving/sky-serve.rst

+Speedup Weights Loading in Large Model Serving
+----------------------------------------------
+
+When serving large models, the weights of the model are loaded from the cloud storage / public internet to the VMs. This process can take a long time, especially for large models. To speed up the weights loading, you can use the following methods:


It seems the benefit of using machine image or docker image is more for reducing the setup time, instead of the the model weight downloading time, as they are mostly for packaging the dependencies and not neccessarily means it will speed up the download of the image or the model, which should be limited by the network bandwidth instead.

Should we rewrite the section as reducing the overhead of environment setup?

Good point! Let me rephrase that

I guess there are some speedups if using a machine image? That should be optimized by the cloud provider which makes it faster than a plain network download?

Machine images are intended to be used in a single region. It can be used in another region for launching a VM, but there will involve data transfer from one region to another. Cloud provider should have optimized it, but we probably want to be careful about the wording to avoid having a impression that all the benefits comes from weight loading.

docs/source/serving/sky-serve.rst

concretevitamin · 2024-02-13T05:58:21Z

docs/source/serving/sky-serve.rst

+
+    # Here goes the setup and run commands...
+
+This is easier to configure than machine images, but it may have a longer startup time than machine images since it needs to pull the docker image from the registry.


Have we actually timed these two methods?

I'm feeling like it is hard to make a fair comparison - it is largely dependent on the base docker/machine image used... Though I'll try to make some benchmarks and see the results 🫡

Co-authored-by: Zongheng Yang <[email protected]>

Michaelvll · 2024-03-21T22:45:22Z

docs/source/serving/sky-serve.rst

+      image_id: docker:docker-image-with-dependency-installed
+
+    # Followed by setup and run commands.
+


We could mention something about how this docker image should be built, especially, it could have SkyPilot runtime pre-built. Something like the following would be useful (could you help giving an concrete example for how to install vllm, and download the image in the Dockerfile below for a better reference, i.e. replacing the line # Your dependencies installation and model download code goes here with actual workable commands for serving vllm+mistral):

Your docker image can have all skypilot dependencies pre-installed to further reduce the setup time, you could try building your docker image based from our base image. The `Dockerfile` could look like the following: ```Dockerfile FROM docker:berkeleyskypilot/skypilot-k8s-gpu:latest # Your dependencies installation and model download code goes here ```

Michaelvll

Thanks for adding this doc @cblmemo! We have users asking for this and it would be nice we can directly point them to this page. : )

cblmemo · 2024-03-23T14:46:39Z

Thanks for adding this doc @cblmemo! We have users asking for this and it would be nice we can directly point them to this page. : )

Just want to update the status of this PR first: I found a mysterious bug that causes an NVML initialization error when using docker container as runtime env. By bisect it seems like those lines are causing the error:

skypilot/sky/provision/docker_utils.py

Lines 271 to 274 in 82c50f5

    
           'sudo systemctl stop jupyter > /dev/null 2>&1 || true;' 
        
           'sudo systemctl disable jupyter > /dev/null 2>&1 || true;' 
        
           'sudo systemctl stop jupyterhub > /dev/null 2>&1 || true;' 
        
           'sudo systemctl disable jupyterhub > /dev/null 2>&1 || true;',

That is very strange since those are running on the host but somehow affect the containers. Will investigate more.

…kypilot-org/skypilot into serve-docs-speedup-weights-downloading

…weights-downloading

Michaelvll · 2024-04-25T05:52:44Z

We should consider having this PR updated and merged as well. : )

cblmemo · 2024-04-26T01:47:47Z

We should consider having this PR updated and merged as well. : )

This PR is blocked by the max/ultra disk tier as the current performance is not better than install everything from pip...

Michaelvll · 2024-08-07T16:40:03Z

Another user requests this. : )

cblmemo · 2024-08-26T02:57:33Z

Left some benchmark results using PR #3860. In the following table, high = gp3 7,000 IOPS, ultra = io2 20,000 IOPS, max = io2 100,000 IOPS. All tests are running on AWS and the result is the e2e execution time for launching a Llama 2 70b checkpoint w/ the latest version of vLLM, on an A10G:8 instance.

	`high`	`ultra`	`max`
Use AMI	> 2 hours	487s	467s
Download from HF	524s	410s	-

In conclusion, our high disk tier is indeed not enough for large checkpoint downloading and the ultra tier increases the performance a lot. Though the AMI does not enhance the performance as expected; there might be other bottlenecks like the download speed of the AMI.

Michaelvll · 2024-09-13T22:51:29Z

We should revamp this PR with our latest findings and support for ultra disk : )

cblmemo · 2024-09-16T18:29:46Z

We should revamp this PR with our latest findings and support for ultra disk : )

Done in #3949 . Still keeping this so we could investigate if it is possible to speed up by using AMI & docker image.

init

68f8d0b

Michaelvll reviewed Feb 2, 2024

View reviewed changes

docs/source/serving/sky-serve.rst Outdated Show resolved Hide resolved

cblmemo added 2 commits February 1, 2024 23:02

add setup and run

80da9c3

accs

1f660ee

Michaelvll reviewed Feb 2, 2024

View reviewed changes

change to dependency install

b927c37

concretevitamin mentioned this pull request Feb 5, 2024

[Docs] Add an example how to cache the models? #3091

Closed

cblmemo requested a review from concretevitamin February 6, 2024 00:58

concretevitamin reviewed Feb 6, 2024

View reviewed changes

docs/source/serving/sky-serve.rst Outdated Show resolved Hide resolved

concretevitamin reviewed Feb 6, 2024

View reviewed changes

docs/source/serving/sky-serve.rst Outdated Show resolved Hide resolved

apply suggestions from code review

6b14219

cblmemo requested a review from concretevitamin February 12, 2024 09:08

concretevitamin reviewed Feb 13, 2024

View reviewed changes

cblmemo and others added 2 commits February 13, 2024 23:21

Apply suggestions from code review

ae19f82

Co-authored-by: Zongheng Yang <[email protected]>

followed by

c96a9b5

concretevitamin mentioned this pull request Mar 19, 2024

[Docs/Provisioner] Guide to minimize setup overheads by the use of Docker image #3334

Open

Michaelvll reviewed Mar 21, 2024

View reviewed changes

cblmemo added 6 commits March 30, 2024 20:59

fix docker bug for GPU

74d8a5b

Merge branch 'serve-docs-speedup-weights-downloading' of github.com:s…

54b9b08

…kypilot-org/skypilot into serve-docs-speedup-weights-downloading

Merge remote-tracking branch 'origin/master' into serve-docs-speedup-…

fa4276f

…weights-downloading

move to a new tab

9f6928e

dump dockerfile commands first

cdb6ba7

typo

f868609

cblmemo mentioned this pull request Apr 11, 2024

[Core][BugFix] Fix GPU detach when using docker container as runtime env #3436

Merged

7 tasks

Michaelvll added the P0 label Sep 13, 2024

cblmemo changed the title ~~[Docs][Serve] Speed up weights loading~~ [Docs][Serve] Speed up weights loading by AMI and Docker Image Sep 16, 2024

cblmemo mentioned this pull request Sep 16, 2024

[Docs][Serve] Speed up weights loading by using ultra disk tier #3949

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Docs][Serve] Speed up weights loading by AMI and Docker Image #3073

[Docs][Serve] Speed up weights loading by AMI and Docker Image #3073

cblmemo commented Feb 2, 2024 •

edited

Loading

Michaelvll Feb 2, 2024

cblmemo Feb 2, 2024

cblmemo Feb 2, 2024

Michaelvll Feb 2, 2024

concretevitamin Feb 13, 2024

cblmemo Feb 13, 2024

Michaelvll Mar 21, 2024

Michaelvll left a comment

cblmemo commented Mar 23, 2024

Michaelvll commented Apr 25, 2024

cblmemo commented Apr 26, 2024

Michaelvll commented Aug 7, 2024

cblmemo commented Aug 26, 2024

Michaelvll commented Sep 13, 2024

cblmemo commented Sep 16, 2024


		# Here goes the setup and run commands...

		This is easier to configure than machine images, but it may have a longer startup time than machine images since it needs to pull the docker image from the registry.

		image_id: docker:docker-image-with-dependency-installed

		# Followed by setup and run commands.

[Docs][Serve] Speed up weights loading by AMI and Docker Image #3073

Are you sure you want to change the base?

[Docs][Serve] Speed up weights loading by AMI and Docker Image #3073

Conversation

cblmemo commented Feb 2, 2024 • edited Loading

Michaelvll Feb 2, 2024

Choose a reason for hiding this comment

cblmemo Feb 2, 2024

Choose a reason for hiding this comment

cblmemo Feb 2, 2024

Choose a reason for hiding this comment

Michaelvll Feb 2, 2024

Choose a reason for hiding this comment

concretevitamin Feb 13, 2024

Choose a reason for hiding this comment

cblmemo Feb 13, 2024

Choose a reason for hiding this comment

Michaelvll Mar 21, 2024

Choose a reason for hiding this comment

Michaelvll left a comment

Choose a reason for hiding this comment

cblmemo commented Mar 23, 2024

Michaelvll commented Apr 25, 2024

cblmemo commented Apr 26, 2024

Michaelvll commented Aug 7, 2024

cblmemo commented Aug 26, 2024

Michaelvll commented Sep 13, 2024

cblmemo commented Sep 16, 2024

cblmemo commented Feb 2, 2024 •

edited

Loading