- [Docs] 0.15.0 release

dstackai · Feb 8, 2024 · 9e0f417 · 9e0f417
1 parent ccfb07f
commit 9e0f417
Show file tree

Hide file tree

Showing 28 changed files with 426 additions and 530 deletions.
diff --git a/README.md b/README.md
@@ -29,12 +29,11 @@ Supported providers: AWS, GCP, Azure, Lambda, TensorDock, Vast.ai, and DataCrunc
 
 ## Latest news ✨
 
+- [2024/01] [dstack 0.15.0: Resources, authentication, and more](https://dstack.ai/blog/2024/02/08/resources-authentication-and-more/) (Release)
 - [2024/01] [dstack 0.14.0: OpenAI-compatible endpoints preview](https://dstack.ai/blog/2024/01/19/openai-endpoints-preview/) (Release)
 - [2023/12] [dstack 0.13.0: Disk size, CUDA 12.1, Mixtral, and more](https://dstack.ai/blog/2023/12/22/disk-size-cuda-12-1-mixtral-and-more/) (Release)
 - [2023/11] [dstack 0.12.3: Vast.ai integration](https://dstack.ai/blog/2023/11/21/vastai/) (Release)
 - [2023/10] [dstack 0.12.2: TensorDock integration](https://dstack.ai/blog/2023/10/31/tensordock/) (Release)
-- [2023/09] [RAG with Llama Index and Weaviate](https://dstack.ai/examples/llama-index/) (Example)
-- [2023/08] [Fine-tuning with QLoRA](https://dstack.ai/examples/qlora/) (Example)
 
 ## Installation
 

diff --git a/docs/assets/stylesheets/extra.css b/docs/assets/stylesheets/extra.css
@@ -1039,19 +1039,6 @@ html .md-footer-meta.md-typeset a:is(:focus,:hover) {
         display: none;
     }
 
-    .md-tabs__item:nth-child(5) a {
-        border-image: linear-gradient(45deg, #0048ff, #ce00ff) 10;
-        border-width: 1.5px;
-        border-style: solid;
-        background: -webkit-linear-gradient(45deg, #0048ff, #ce00ff);
-        -webkit-background-clip: text;
-        -webkit-text-fill-color: transparent;
-        padding: 7px 25px;
-        height: 40px;
-        margin-top: 16px;
-        font-size: 17.5px;
-    }
-
     .md-tabs__item:nth-child(6) {
         padding-right: 0.5rem;
         margin-left: auto;
@@ -1069,7 +1056,7 @@ html .md-footer-meta.md-typeset a:is(:focus,:hover) {
         visibility: visible;
     }
 
-    .md-tabs__item:nth-child(4) .md-tabs__link:after {
+    .md-tabs__item:nth-child(5) .md-tabs__link:after {
         content: url('data:image/svg+xml,<svg width="16" height="16" viewBox="1 1 27 27" xmlns="http://www.w3.org/2000/svg" fill="rgba(0,0,0,0.87)" stroke="rgba(0,0,0,0.87)" stroke-width="0.75" stroke-linecap="round" stroke-linejoin="round"><path d="M23.5 23.5h-15v-15h4.791V6H6v20h20v-7.969h-2.5z"/><path d="M17.979 6l3.016 3.018-6.829 6.829 1.988 1.987 6.83-6.828L26 14.02V6z"/></svg>');
         line-height: 14px;
         padding-left: 3px;
@@ -1141,7 +1128,7 @@ html .md-footer-meta.md-typeset a:is(:focus,:hover) {
 
 @media screen and (min-width: 76.25em) {
     .md-search .md-search__inner {
-        padding-top: 0.55rem;
+        padding-top: 0.58rem;
         margin-right: 0.8rem;
     }
 

diff --git a/docs/assets/stylesheets/landing.css b/docs/assets/stylesheets/landing.css
@@ -234,6 +234,17 @@
     -webkit-text-fill-color: transparent;
 }
 
+.md-header__buttons {
+    padding-top: 11px;
+}
+
+.md-header__buttons .md-button-secondary {
+    border-width: 2px;
+    font-weight: 700 !important;
+    font-size: 0.8rem;
+    text-transform: uppercase;
+}
+
 .md-header__buttons .md-button-secondary:hover,
 .tx-container .md-button-secondary:hover {
     background: -webkit-linear-gradient(45deg, #0048ff, #ce00ff);

diff --git a/docs/blog/posts/openai-endpoints-preview.md b/docs/blog/posts/openai-endpoints-preview.md
@@ -1,5 +1,4 @@
 ---
-title: "dstack 0.14.0: OpenAI-compatible endpoints preview"
 date: 2024-01-19
 description: "Making it easier to deploy custom LLMs as OpenAI-compatible endpoints."
 slug: "openai-endpoints-preview"

diff --git a/docs/blog/posts/resources-authentization-and-more.md b/docs/blog/posts/resources-authentization-and-more.md
@@ -0,0 +1,150 @@
+---
+date: 2024-02-08
+description: "Resource configuration, authentication in services, model mapping for vLLM, and other improvements."
+slug: "resources-authentication-and-more"
+categories:
+  - Releases
+---
+
+# dstack 0.15.0: Resources, authentication, and more
+
+__Resource configuration in YAML, authentication in services, and other improvements.__
+
+The latest update brings many improvements, enabling the configuration of resources in YAML files, requiring
+authentication in services, supporting OpenAI-compatible endpoints for vLLM, and more. 
+
+<!-- more -->
+
+## Resource configuration
+
+Previously, if you wanted to request hardware resources, you had to either use the corresponding arguments with
+`dstack run` (e.g. `--gpu GPU_SPEC`) or use `.dstack/profiles.yml`.
+
+With `0.15.0`, it is now possible to configure resources in the YAML configuration file:
+
+<div editor-title=".dstack.yml">
+
+```yaml
+type: dev-environment
+
+python: 3.11
+ide: vscode
+
+# (Optional) Configure `gpu`, `memory`, `disk`, etc 
+resources:
+  gpu: 24GB
+```
+
+</div>
+
+Supported properties include: `gpu`, `cpu`, `memory`, `disk`, and `shm_size`.
+
+If you specify memory size, you can either specify an explicit size (e.g. `24GB`) or a 
+range (e.g. `24GB..`, or `24GB..80GB`, or `..80GB`).
+
+The `gpu` property allows specifying not only memory size but also GPU names
+and their quantity. Examples: `A100` (one A100), `A10G,A100` (either A10G or A100), 
+`A100:80GB` (one A100 of 80GB), `A100:2` (two A100), `24GB..40GB:2` (two GPUs between 24GB and 40GB), etc.
+
+It's also possible to configure `gpu` as an object:
+
+<div editor-title=".dstack.yml">
+
+```yaml
+type: dev-environment
+
+python: 3.11
+ide: vscode
+
+# Require 2 GPUs of at least 40GB with CUDA compute compatibility of 7.5
+resources:
+  gpu:
+    count: 2
+    memory: 40GB..
+    compute_capability: 7.5
+```
+
+</div>
+
+For more details on `resources` schema, refer to the [Reference](../../docs/reference/dstack.yml.md).
+
+## Authentication in services
+
+Previously, when deploying a service, the public endpoint didn't support authentication, 
+meaning anyone with access to the gateway could call it.
+
+With `0.15.0`, by default, service endpoints require the `Authentication` header with `"Bearer <dstack token>"`. 
+
+<div class="termy">
+
+```shell
+$ curl https://yellow-cat-1.example.com/generate \
+    -X POST \
+    -d '{"inputs":"&lt;s&gt;[INST] What is your favourite condiment?[/INST]"}' \
+    -H 'Content-Type: application/json' \
+    -H 'Authentication: "Bearer &lt;dstack token&gt;"'
+```
+
+</div>
+
+Authentication can be disabled by setting `auth` to `false` in the service configuration file.
+
+#### OpenAI interface
+
+In case the service has [model mapping](../../docs/concepts/services.md#model-mapping) configured, 
+the OpenAI-compatible endpoint requires authentication.
+
+```python
+from openai import OpenAI
+
+
+client = OpenAI(
+  base_url="https://gateway.example.com",
+  api_key="<dstack token>"
+)
+
+completion = client.chat.completions.create(
+  model="mistralai/Mistral-7B-Instruct-v0.1",
+  messages=[
+    {"role": "user", "content": "Compose a poem that explains the concept of recursion in programming."}
+  ]
+)
+
+print(completion.choices[0].message)
+```
+
+## Model mapping for vLLM
+
+Last but not least, we've added one more format for [model mapping](../../docs/concepts/services.md#model-mapping): `openai`.
+
+For example, if you run vLLM using the OpenAI mode, it's possible to configure model mapping for it.
+
+```yaml
+type: service
+
+python: "3.11"
+env:
+  - MODEL=NousResearch/Llama-2-7b-chat-hf
+commands:
+  - pip install vllm
+  - python -m vllm.entrypoints.openai.api_server --model $MODEL --port 8000
+port: 8000
+
+resources:
+  gpu: 24GB
+
+model:
+  format: openai
+  type: chat
+  name: NousResearch/Llama-2-7b-chat-hf
+```
+
+When we run such a service, it will be possible to access the model at  
+`https://gateway.<gateway domain>` via the OpenAI-compatible interface 
+and using your `dstack` user token.
+
+## Feedback
+
+In case you have any questions, experience bugs, or need help, 
+drop us a message on our [Discord server](https://discord.gg/u8SmfwPpMd) or submit it as a 
+[GitHub issue](https://github.com/dstackai/dstack/issues/new/choose).
diff --git a/docs/docs/concepts/dev-environments.md b/docs/docs/concepts/dev-environments.md
@@ -16,9 +16,15 @@ both acceptable).
 ```yaml
 type: dev-environment
 
-python: "3.11" # (Optional) If not specified, your local version is used
+# Use either `python` or `image` to configure environment
+python: "3.11"
+# image: ghcr.io/huggingface/text-generation-inference:latest
 
 ide: vscode
+
+# (Optional) Configure `gpu`, `memory`, `disk`, etc
+resources:
+  gpu: 80GB
 ```
 
 </div>
@@ -36,7 +42,7 @@ configuration file path, and any other options (e.g., for requesting hardware re
 <div class="termy">
 
 ```shell
-$ dstack run . -f .dstack.yml --gpu A100
+$ dstack run . -f .dstack.yml
 
  BACKEND     REGION         RESOURCES                     SPOT  PRICE
  tensordock  unitedkingdom  10xCPU, 80GB, 1xA100 (80GB)   no    $1.595
@@ -55,8 +61,8 @@ To open in VS Code Desktop, use this link:
 </div>
 
 !!! info "Run options"
-    The `dstack run` command allows you to use `--gpu` to request GPUs (e.g. `--gpu A100` or `--gpu 80GB` or `--gpu A100:4`, etc.),
-    and many other options (incl. spot instances, disk size, max price, max duration, retry policy, etc.).
+    The `dstack run` command allows you to use specify the spot policy (e.g. `--spot-auto`, `--spot`, or `--on-demand`), 
+    max duration of the run (e.g. `--max-duration 1h`), and many other options.
     For more details, refer to the [Reference](../reference/cli/index.md#dstack-run).
 
 Once the dev environment is provisioned, click the link to open the environment in your desktop IDE.

diff --git a/docs/docs/concepts/services.md b/docs/docs/concepts/services.md
@@ -1,6 +1,6 @@
 # Services
 
-Services make it easy to deploy models and apps as public endpoints, allowing you to use any
+Services make it easy to deploy models and apps as public endpoints, while giving you the flexibility to use any
 frameworks.
 
 ??? info "Prerequisites"
@@ -53,6 +53,10 @@ env:
 port: 80
 commands:
   - text-generation-launcher --port 80 --trust-remote-code
+
+# (Optional) Configure `gpu`, `memory`, `disk`, etc
+resources:
+  gpu: 80GB
 ```
 
 </div>
@@ -84,6 +88,11 @@ port: 80
 commands:
   - text-generation-launcher --port 80 --trust-remote-code
   
+# (Optional) Configure `gpu`, `memory`, `disk`, etc
+resources:
+  gpu: 80GB
+
+# (Optional) Enable the OpenAI-compatible endpoint   
 model:
   type: chat
   name: mistralai/Mistral-7B-Instruct-v0.1
@@ -95,7 +104,10 @@ model:
 In this case, with such a configuration, once the service is up, you'll be able to access the model at
 `https://gateway.<gateway domain>` via the OpenAI-compatible interface.
 
-#### Chat template
+The `format` supports only `tgi` (Text Generation Inference) 
+and `openai` (if you are using Text Generation Inference or vLLM with OpenAI-compatible mode).
+
+##### Chat template
 
 By default, `dstack` loads the [chat template](https://huggingface.co/docs/transformers/main/en/chat_templating) 
 from the model's repository. If it is not present there, manual configuration is required.
@@ -110,6 +122,11 @@ port: 80
 commands:
   - text-generation-launcher --port 80 --trust-remote-code --quantize gptq
 
+# (Optional) Configure `gpu`, `memory`, `disk`, etc
+resources:
+  gpu: 80GB
+
+# (Optional) Enable the OpenAI-compatible endpoint
 model:
   type: chat
   name: TheBloke/Llama-2-13B-chat-GPTQ
@@ -123,8 +140,7 @@ model:
     
     1. Doesn't work if your `chat_template` uses `bos_token`. As a workaround, replace `bos_token` inside `chat_template` with the token content itself.
     2. Doesn't work if `eos_token` is defined in the model repository as a dictionary. As a workaround, set `eos_token` manually, as shown in the example above (see Chat template).
-    3. Only works if you're using Text Generation Inference. Support for vLLM and other serving frameworks is coming later.
-
+
     If you encounter any other issues, please make sure to file a [GitHub issue](https://github.com/dstackai/dstack/issues/new/choose).
 
 ## Run the configuration
@@ -135,7 +151,7 @@ configuration file path, and any other options (e.g., for requesting hardware re
 <div class="termy">
 
 ```shell
-$ dstack run . -f serve.dstack.yml --gpu A100
+$ dstack run . -f serve.dstack.yml
 
  BACKEND     REGION         RESOURCES                     SPOT  PRICE
  tensordock  unitedkingdom  10xCPU, 80GB, 1xA100 (80GB)   no    $1.595
@@ -153,22 +169,27 @@ Service is published at https://yellow-cat-1.example.com
 </div>
 
 !!! info "Run options"
-    The `dstack run` command allows you to use `--gpu` to request GPUs (e.g. `--gpu A100` or `--gpu 80GB` or `--gpu A100:4`, etc.),
-    and many other options (incl. spot instances, disk size, max price, max duration, retry policy, etc.).
+    The `dstack run` command allows you to use specify the spot policy (e.g. `--spot-auto`, `--spot`, or `--on-demand`), 
+    max duration of the run (e.g. `--max-duration 1h`), and many other options.
     For more details, refer to the [Reference](../reference/cli/index.md#dstack-run).
 
 ### Service endpoint
 
-Once the service is up, you'll be able to 
-access it at `https://<run name>.<gateway domain>`.
+Once the service is up, you'll be able to access it at `https://<run name>.<gateway domain>`.
+
+#### Authentication
+
+By default, the service endpoint requires the `Authentication` header with `"Bearer <dstack token>"`. 
+Authentication can be disabled by setting `auth` to `false` in the service configuration file.
 
 <div class="termy">
 
 ```shell
 $ curl https://yellow-cat-1.example.com/generate \
     -X POST \
     -d '{"inputs":"&lt;s&gt;[INST] What is your favourite condiment?[/INST]"}' \
-    -H 'Content-Type: application/json'
+    -H 'Content-Type: application/json' \
+    -H 'Authentication: "Bearer &lt;dstack token&gt;"'
 ```
 
 </div>
@@ -184,7 +205,7 @@ from openai import OpenAI
 
 client = OpenAI(
   base_url="https://gateway.example.com",
-  api_key="none"
+  api_key="<dstack token>"
 )
 
 completion = client.chat.completions.create(