From 53560082dde01345fb0cc3e5bd72c1d18d6dfc79 Mon Sep 17 00:00:00 2001 From: cblmemo Date: Wed, 3 Jan 2024 22:26:24 -0800 Subject: [PATCH] change to vicuna 13b & A100:1 --- docs/source/serving/sky-serve.rst | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/source/serving/sky-serve.rst b/docs/source/serving/sky-serve.rst index 235fbff908b..e33e8577e03 100644 --- a/docs/source/serving/sky-serve.rst +++ b/docs/source/serving/sky-serve.rst @@ -79,8 +79,8 @@ Here is a simple example of serving an LLM model on TGI and vLLM with SkyServe: # Fields below describe each replica. resources: - ports: 8000 - accelerators: A100-80GB:2 + ports: 8080 + accelerators: A100:1 setup: | conda create -n vllm python=3.9 -y @@ -90,8 +90,8 @@ Here is a simple example of serving an LLM model on TGI and vLLM with SkyServe: run: | conda activate vllm python -m vllm.entrypoints.openai.api_server \ - --model mistralai/Mixtral-8x7B-Instruct-v0.1 \ - --host 0.0.0.0 --port 8000 --tensor-parallel-size 2 + --model lmsys/vicuna-13b-v1.5 \ + --host 0.0.0.0 --port 8080 Use :code:`sky serve status` to check the status of the service: