Skip to content

Commit

Permalink
upd to mixtral; TODO: test
Browse files Browse the repository at this point in the history
  • Loading branch information
cblmemo committed Jan 5, 2024
1 parent 23fb2bd commit 2a45688
Showing 1 changed file with 7 additions and 7 deletions.
14 changes: 7 additions & 7 deletions docs/source/serving/sky-serve.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ How it works:
Quick tour: LLM serving
-----------------------

Here is a simple example of serving an LLM model (:code:`Mistral-7B-Instruct-v0.1`) on vLLM or TGI:
Here is a simple example of serving an LLM model (:code:`Mixtral-8x7B-Instruct-v0.1`) on vLLM or TGI:

.. tab-set::

Expand All @@ -62,7 +62,7 @@ Here is a simple example of serving an LLM model (:code:`Mistral-7B-Instruct-v0.
# Fields below describe each replica.
resources:
ports: 8080
accelerators: {A100, L4, T4}
accelerators: {L4:8, A10g:8, A10:8, A100:4, A100:8, A100-80GB:2, A100-80GB:4, A100-80GB:8}
setup: |
conda create -n vllm python=3.9 -y
Expand All @@ -73,7 +73,7 @@ Here is a simple example of serving an LLM model (:code:`Mistral-7B-Instruct-v0.
conda activate vllm
python -m vllm.entrypoints.openai.api_server \
--host 0.0.0.0 --port 8080 \
--model mistralai/Mistral-7B-Instruct-v0.1
--model mistralai/Mixtral-8x7B-Instruct-v0.1
.. tab-item:: TGI
:sync: tgi-tab
Expand All @@ -88,12 +88,12 @@ Here is a simple example of serving an LLM model (:code:`Mistral-7B-Instruct-v0.
# Fields below describe each replica.
resources:
ports: 8080
accelerators: {A100, L4, T4}
accelerators: {L4:8, A10g:8, A10:8, A100:4, A100:8, A100-80GB:2, A100-80GB:4, A100-80GB:8}
run: |
docker run --gpus all --shm-size 1g -p 8080:80 -v ~/data:/data \
ghcr.io/huggingface/text-generation-inference \
--model-id mistralai/Mistral-7B-Instruct-v0.1
--model-id mistralai/Mixtral-8x7B-Instruct-v0.1
Run :code:`sky serve up service.yaml` to deploy the service with automatic price and capacity optimization. Once it is deployed, use :code:`sky serve status` to check the status of the service:

Expand Down Expand Up @@ -137,11 +137,11 @@ Simply ``curl -L`` the service endpoint, which automatically load-balances acros
$ curl -L 3.83.159.229:30001/v1/chat/completions \
-X POST \
-d '{"model": "mistralai/Mistral-7B-Instruct-v0.1", "messages": [{"role": "user", "content": "Who are you?"}]}' \
-d '{"model": "mistralai/Mixtral-8x7B-Instruct-v0.1", "messages": [{"role": "user", "content": "Who are you?"}]}' \
-H 'Content-Type: application/json'
# Example output:
{"id":"cmpl-47ca2e9b5a104cae984643f7309f33c6","object":"chat.completion","created":880,"model":"mistralai/Mistral-7B-Instruct-v0.1","choices":[{"index":0,"message":{"role":"assistant","content":" I am a helpful assistant here to provide information, answer questions, and engage in conversation to help make your life easier and more enjoyable. I can help you with a variety of tasks, such as setting reminders, providing weather updates, answering trivia, and much more. How can I assist you today?"},"finish_reason":"stop"}],"usage":{"prompt_tokens":13,"total_tokens":77,"completion_tokens":64}}
{"id":"cmpl-47ca2e9b5a104cae984643f7309f33c6","object":"chat.completion","created":880,"model":"mistralai/Mixtral-8x7B-Instruct-v0.1","choices":[{"index":0,"message":{"role":"assistant","content":" I am a helpful assistant here to provide information, answer questions, and engage in conversation to help make your life easier and more enjoyable. I can help you with a variety of tasks, such as setting reminders, providing weather updates, answering trivia, and much more. How can I assist you today?"},"finish_reason":"stop"}],"usage":{"prompt_tokens":13,"total_tokens":77,"completion_tokens":64}}
.. tab-item:: TGI
:sync: tgi-tab
Expand Down

0 comments on commit 2a45688

Please sign in to comment.