Skip to content

Commit

Permalink
feat(vllm): Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
rtalaricw authored Apr 25, 2024
1 parent 4944ce7 commit 9ebdc6d
Showing 1 changed file with 16 additions and 0 deletions.
16 changes: 16 additions & 0 deletions online-inference/vllm/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
This folder contains instructions to run the vLLM inference server.

Some of the features include:

1. Serialize HuggingFace models supported here to vLLM based format: https://github.com/vllm-project/vllm?tab=readme-ov-file#about
2. Tensorizer support for fast model deserialization and loading from vLLM

To run the example:

1. Run `kubectl apply -f 00-optional-s3-secret.yaml` and replace access_key, secret_key and host_url
2. Run `kubectl apply -f 01-optional-s3-serialize-job.yaml and replace --model EleutherAI/pythia-70m, --serialized-directory s3://my-bucket/ and optionally --suffix vllm
3. Run `kubectl apply -f 02-inference-service.yaml and replace --model EleutherAI/pythia-70m and --model-loader-extra-config '{"tensorizer_uri": "s3://model-store/vllm/EleutherAI/pythia-70m/vllm/model.tensors"}' with your serialized model path

You should have an inference service running a container with an OpenAI compatible server.

You can use the client to interact with it. More information about the client can be found here: https://docs.vllm.ai/en/latest/getting_started/quickstart.html

0 comments on commit 9ebdc6d

Please sign in to comment.