diff --git a/docs/source/serving/deploying_with_dstack.rst b/docs/source/serving/deploying_with_dstack.rst
new file mode 100644
index 0000000000000..baf87314ca8e4
--- /dev/null
+++ b/docs/source/serving/deploying_with_dstack.rst
@@ -0,0 +1,103 @@
+.. _deploying_with_dstack:
+
+Deploying with dstack
+============================
+
+.. raw:: html
+
+
+
+
+
+vLLM can be run on a cloud based GPU machine with `dstack `__, an open-source framework for running LLMs on any cloud. This tutorial assumes that you have already configured credentials, gateway, and GPU quotas on your cloud environment.
+
+To install dstack client, run:
+
+.. code-block:: console
+
+ $ pip install "dstack[all]
+ $ dstack server
+
+Next, to configure your dstack project, run:
+
+.. code-block:: console
+
+ $ mkdir -p vllm-dstack
+ $ cd vllm-dstack
+ $ dstack init
+
+Next, to provision a VM instance with LLM of your choice(`NousResearch/Llama-2-7b-chat-hf` for this example), create the following `serve.dstack.yml` file for the dstack `Service`:
+
+.. code-block:: yaml
+
+ type: service
+
+ python: "3.11"
+ env:
+ - MODEL=NousResearch/Llama-2-7b-chat-hf
+ port: 8000
+ resources:
+ gpu: 24GB
+ commands:
+ - pip install vllm
+ - python -m vllm.entrypoints.openai.api_server --model $MODEL --port 8000
+ model:
+ format: openai
+ type: chat
+ name: NousResearch/Llama-2-7b-chat-hf
+
+Then, run the following CLI for provisioning:
+
+.. code-block:: console
+
+ $ dstack run . -f serve.dstack.yml
+
+ ⠸ Getting run plan...
+ Configuration serve.dstack.yml
+ Project deep-diver-main
+ User deep-diver
+ Min resources 2..xCPU, 8GB.., 1xGPU (24GB)
+ Max price -
+ Max duration -
+ Spot policy auto
+ Retry policy no
+
+ # BACKEND REGION INSTANCE RESOURCES SPOT PRICE
+ 1 gcp us-central1 g2-standard-4 4xCPU, 16GB, 1xL4 (24GB), 100GB (disk) yes $0.223804
+ 2 gcp us-east1 g2-standard-4 4xCPU, 16GB, 1xL4 (24GB), 100GB (disk) yes $0.223804
+ 3 gcp us-west1 g2-standard-4 4xCPU, 16GB, 1xL4 (24GB), 100GB (disk) yes $0.223804
+ ...
+ Shown 3 of 193 offers, $5.876 max
+
+ Continue? [y/n]: y
+ ⠙ Submitting run...
+ ⠏ Launching spicy-treefrog-1 (pulling)
+ spicy-treefrog-1 provisioning completed (running)
+ Service is published at ...
+
+After the provisioning, you can interact with the model by using the OpenAI SDK:
+
+.. code-block:: python
+
+ from openai import OpenAI
+
+ client = OpenAI(
+ base_url="https://gateway.",
+ api_key=""
+ )
+
+ completion = client.chat.completions.create(
+ model="NousResearch/Llama-2-7b-chat-hf",
+ messages=[
+ {
+ "role": "user",
+ "content": "Compose a poem that explains the concept of recursion in programming.",
+ }
+ ]
+ )
+
+ print(completion.choices[0].message.content)
+
+.. note::
+
+ dstack automatically handles authentication on the gateway using dstack's tokens. Meanwhile, if you don't want to configure a gateway, you can provision dstack `Task` instead of `Service`. The `Task` is for development purpose only. If you want to know more about hands-on materials how to serve vLLM using dstack, check out `this repository `__
diff --git a/docs/source/serving/integrations.rst b/docs/source/serving/integrations.rst
index 2066e80b03298..83a8b5a88bd38 100644
--- a/docs/source/serving/integrations.rst
+++ b/docs/source/serving/integrations.rst
@@ -9,4 +9,5 @@ Integrations
deploying_with_triton
deploying_with_bentoml
deploying_with_lws
+ deploying_with_dstack
serving_with_langchain