Skip to content

Commit

Permalink
init
Browse files Browse the repository at this point in the history
  • Loading branch information
cblmemo committed Jan 6, 2024
1 parent 4ac3479 commit 249ce7d
Show file tree
Hide file tree
Showing 2 changed files with 79 additions and 0 deletions.
58 changes: 58 additions & 0 deletions llm/vllm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,3 +126,61 @@ curl http://$IP:8000/v1/chat/completions \
}
}
```

## Serving Mixtral 8x7b model with vLLM and SkyServe

1. Start serving the Mixtral 8x7b model using [SkyServe](https://skypilot.readthedocs.io/en/latest/serving/sky-serve.html) CLI:
```bash
sky serve up -n vllm-mixtral mixtral-service.yaml
```

2. Use `sky serve status` to check the status of the serving:
```bash
sky serve status vllm-mixtral
```

You should get a similar output as the following:

```console
Services
NAME UPTIME STATUS REPLICAS ENDPOINT
vllm-mixtral 7m 43s READY 2/2 3.84.15.251:30001

Service Replicas
SERVICE_NAME ID IP LAUNCHED RESOURCES STATUS REGION
vllm-mixtral 1 34.66.255.4 11 mins ago 1x GCP({'L4': 8}) READY us-central1
vllm-mixtral 2 35.221.37.64 15 mins ago 1x GCP({'L4': 8}) READY us-east4
```

3. Once it status is `READY`, you can use the endpoint to interact with the model:

```bash
$ curl -L 3.84.15.251:30001/v1/chat/completions \
-X POST \
-d '{"model": "mistralai/Mixtral-8x7B-Instruct-v0.1", "messages": [{"role": "user", "content": "Who are you?"}]}' \
-H 'Content-Type: application/json'
```

You should get a similar response as the following:

```console
{
'id': 'cmpl-80b2bfd6f60c4024884c337a7e0d859a',
'object': 'chat.completion',
'created': 1005,
'model': 'mistralai/Mixtral-8x7B-Instruct-v0.1',
'choices': [
{
'index': 0,
'message': {
'role': 'assistant',
'content': ' I am a helpful AI assistant designed to provide information, answer questions, and engage in conversation with users.
I do not have personal experiences or emotions, but I am programmed to understand and process human language, and to provide helpful and accurate
responses.'
},
'finish_reason': 'stop'
}
],
'usage': {'prompt_tokens': 13, 'total_tokens': 64, 'completion_tokens': 51}
}
```
21 changes: 21 additions & 0 deletions llm/vllm/mixtral-service.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# service.yaml
service:
readiness_probe: /v1/models
replicas: 2

# Fields below describe each replica.
resources:
ports: 8080
accelerators: {L4:8, A10g:8, A100:4, A100:8, A100-80GB:2, A100-80GB:4, A100-80GB:8}

setup: |
conda create -n vllm python=3.9 -y
conda activate vllm
pip install vllm
run: |
conda activate vllm
python -m vllm.entrypoints.openai.api_server \
--tensor-parallel-size $SKYPILOT_NUM_GPUS_PER_NODE \
--host 0.0.0.0 --port 8080 \
--model mistralai/Mixtral-8x7B-Instruct-v0.1

0 comments on commit 249ce7d

Please sign in to comment.