Skip to content

Commit

Permalink
- [Docs] 0.15.0 release
Browse files Browse the repository at this point in the history
  • Loading branch information
peterschmidt85 committed Feb 8, 2024
1 parent ccfb07f commit 9e0f417
Show file tree
Hide file tree
Showing 28 changed files with 426 additions and 530 deletions.
3 changes: 1 addition & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,12 +29,11 @@ Supported providers: AWS, GCP, Azure, Lambda, TensorDock, Vast.ai, and DataCrunc

## Latest news ✨

- [2024/01] [dstack 0.15.0: Resources, authentication, and more](https://dstack.ai/blog/2024/02/08/resources-authentication-and-more/) (Release)
- [2024/01] [dstack 0.14.0: OpenAI-compatible endpoints preview](https://dstack.ai/blog/2024/01/19/openai-endpoints-preview/) (Release)
- [2023/12] [dstack 0.13.0: Disk size, CUDA 12.1, Mixtral, and more](https://dstack.ai/blog/2023/12/22/disk-size-cuda-12-1-mixtral-and-more/) (Release)
- [2023/11] [dstack 0.12.3: Vast.ai integration](https://dstack.ai/blog/2023/11/21/vastai/) (Release)
- [2023/10] [dstack 0.12.2: TensorDock integration](https://dstack.ai/blog/2023/10/31/tensordock/) (Release)
- [2023/09] [RAG with Llama Index and Weaviate](https://dstack.ai/examples/llama-index/) (Example)
- [2023/08] [Fine-tuning with QLoRA](https://dstack.ai/examples/qlora/) (Example)

## Installation

Expand Down
17 changes: 2 additions & 15 deletions docs/assets/stylesheets/extra.css
Original file line number Diff line number Diff line change
Expand Up @@ -1039,19 +1039,6 @@ html .md-footer-meta.md-typeset a:is(:focus,:hover) {
display: none;
}

.md-tabs__item:nth-child(5) a {
border-image: linear-gradient(45deg, #0048ff, #ce00ff) 10;
border-width: 1.5px;
border-style: solid;
background: -webkit-linear-gradient(45deg, #0048ff, #ce00ff);
-webkit-background-clip: text;
-webkit-text-fill-color: transparent;
padding: 7px 25px;
height: 40px;
margin-top: 16px;
font-size: 17.5px;
}

.md-tabs__item:nth-child(6) {
padding-right: 0.5rem;
margin-left: auto;
Expand All @@ -1069,7 +1056,7 @@ html .md-footer-meta.md-typeset a:is(:focus,:hover) {
visibility: visible;
}

.md-tabs__item:nth-child(4) .md-tabs__link:after {
.md-tabs__item:nth-child(5) .md-tabs__link:after {
content: url('data:image/svg+xml,<svg width="16" height="16" viewBox="1 1 27 27" xmlns="http://www.w3.org/2000/svg" fill="rgba(0,0,0,0.87)" stroke="rgba(0,0,0,0.87)" stroke-width="0.75" stroke-linecap="round" stroke-linejoin="round"><path d="M23.5 23.5h-15v-15h4.791V6H6v20h20v-7.969h-2.5z"/><path d="M17.979 6l3.016 3.018-6.829 6.829 1.988 1.987 6.83-6.828L26 14.02V6z"/></svg>');
line-height: 14px;
padding-left: 3px;
Expand Down Expand Up @@ -1141,7 +1128,7 @@ html .md-footer-meta.md-typeset a:is(:focus,:hover) {

@media screen and (min-width: 76.25em) {
.md-search .md-search__inner {
padding-top: 0.55rem;
padding-top: 0.58rem;
margin-right: 0.8rem;
}

Expand Down
11 changes: 11 additions & 0 deletions docs/assets/stylesheets/landing.css
Original file line number Diff line number Diff line change
Expand Up @@ -234,6 +234,17 @@
-webkit-text-fill-color: transparent;
}

.md-header__buttons {
padding-top: 11px;
}

.md-header__buttons .md-button-secondary {
border-width: 2px;
font-weight: 700 !important;
font-size: 0.8rem;
text-transform: uppercase;
}

.md-header__buttons .md-button-secondary:hover,
.tx-container .md-button-secondary:hover {
background: -webkit-linear-gradient(45deg, #0048ff, #ce00ff);
Expand Down
1 change: 0 additions & 1 deletion docs/blog/posts/openai-endpoints-preview.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
---
title: "dstack 0.14.0: OpenAI-compatible endpoints preview"
date: 2024-01-19
description: "Making it easier to deploy custom LLMs as OpenAI-compatible endpoints."
slug: "openai-endpoints-preview"
Expand Down
150 changes: 150 additions & 0 deletions docs/blog/posts/resources-authentization-and-more.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
---
date: 2024-02-08
description: "Resource configuration, authentication in services, model mapping for vLLM, and other improvements."
slug: "resources-authentication-and-more"
categories:
- Releases
---

# dstack 0.15.0: Resources, authentication, and more

__Resource configuration in YAML, authentication in services, and other improvements.__

The latest update brings many improvements, enabling the configuration of resources in YAML files, requiring
authentication in services, supporting OpenAI-compatible endpoints for vLLM, and more.

<!-- more -->

## Resource configuration

Previously, if you wanted to request hardware resources, you had to either use the corresponding arguments with
`dstack run` (e.g. `--gpu GPU_SPEC`) or use `.dstack/profiles.yml`.

With `0.15.0`, it is now possible to configure resources in the YAML configuration file:

<div editor-title=".dstack.yml">

```yaml
type: dev-environment

python: 3.11
ide: vscode

# (Optional) Configure `gpu`, `memory`, `disk`, etc
resources:
gpu: 24GB
```
</div>
Supported properties include: `gpu`, `cpu`, `memory`, `disk`, and `shm_size`.

If you specify memory size, you can either specify an explicit size (e.g. `24GB`) or a
range (e.g. `24GB..`, or `24GB..80GB`, or `..80GB`).

The `gpu` property allows specifying not only memory size but also GPU names
and their quantity. Examples: `A100` (one A100), `A10G,A100` (either A10G or A100),
`A100:80GB` (one A100 of 80GB), `A100:2` (two A100), `24GB..40GB:2` (two GPUs between 24GB and 40GB), etc.

It's also possible to configure `gpu` as an object:

<div editor-title=".dstack.yml">

```yaml
type: dev-environment
python: 3.11
ide: vscode
# Require 2 GPUs of at least 40GB with CUDA compute compatibility of 7.5
resources:
gpu:
count: 2
memory: 40GB..
compute_capability: 7.5
```

</div>

For more details on `resources` schema, refer to the [Reference](../../docs/reference/dstack.yml.md).

## Authentication in services

Previously, when deploying a service, the public endpoint didn't support authentication,
meaning anyone with access to the gateway could call it.

With `0.15.0`, by default, service endpoints require the `Authentication` header with `"Bearer <dstack token>"`.

<div class="termy">

```shell
$ curl https://yellow-cat-1.example.com/generate \
-X POST \
-d '{"inputs":"&lt;s&gt;[INST] What is your favourite condiment?[/INST]"}' \
-H 'Content-Type: application/json' \
-H 'Authentication: "Bearer &lt;dstack token&gt;"'
```

</div>

Authentication can be disabled by setting `auth` to `false` in the service configuration file.

#### OpenAI interface

In case the service has [model mapping](../../docs/concepts/services.md#model-mapping) configured,
the OpenAI-compatible endpoint requires authentication.

```python
from openai import OpenAI
client = OpenAI(
base_url="https://gateway.example.com",
api_key="<dstack token>"
)
completion = client.chat.completions.create(
model="mistralai/Mistral-7B-Instruct-v0.1",
messages=[
{"role": "user", "content": "Compose a poem that explains the concept of recursion in programming."}
]
)
print(completion.choices[0].message)
```

## Model mapping for vLLM

Last but not least, we've added one more format for [model mapping](../../docs/concepts/services.md#model-mapping): `openai`.

For example, if you run vLLM using the OpenAI mode, it's possible to configure model mapping for it.

```yaml
type: service
python: "3.11"
env:
- MODEL=NousResearch/Llama-2-7b-chat-hf
commands:
- pip install vllm
- python -m vllm.entrypoints.openai.api_server --model $MODEL --port 8000
port: 8000
resources:
gpu: 24GB
model:
format: openai
type: chat
name: NousResearch/Llama-2-7b-chat-hf
```

When we run such a service, it will be possible to access the model at
`https://gateway.<gateway domain>` via the OpenAI-compatible interface
and using your `dstack` user token.

## Feedback

In case you have any questions, experience bugs, or need help,
drop us a message on our [Discord server](https://discord.gg/u8SmfwPpMd) or submit it as a
[GitHub issue](https://github.com/dstackai/dstack/issues/new/choose).
14 changes: 10 additions & 4 deletions docs/docs/concepts/dev-environments.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,15 @@ both acceptable).
```yaml
type: dev-environment

python: "3.11" # (Optional) If not specified, your local version is used
# Use either `python` or `image` to configure environment
python: "3.11"
# image: ghcr.io/huggingface/text-generation-inference:latest

ide: vscode

# (Optional) Configure `gpu`, `memory`, `disk`, etc
resources:
gpu: 80GB
```
</div>
Expand All @@ -36,7 +42,7 @@ configuration file path, and any other options (e.g., for requesting hardware re
<div class="termy">

```shell
$ dstack run . -f .dstack.yml --gpu A100
$ dstack run . -f .dstack.yml
BACKEND REGION RESOURCES SPOT PRICE
tensordock unitedkingdom 10xCPU, 80GB, 1xA100 (80GB) no $1.595
Expand All @@ -55,8 +61,8 @@ To open in VS Code Desktop, use this link:
</div>

!!! info "Run options"
The `dstack run` command allows you to use `--gpu` to request GPUs (e.g. `--gpu A100` or `--gpu 80GB` or `--gpu A100:4`, etc.),
and many other options (incl. spot instances, disk size, max price, max duration, retry policy, etc.).
The `dstack run` command allows you to use specify the spot policy (e.g. `--spot-auto`, `--spot`, or `--on-demand`),
max duration of the run (e.g. `--max-duration 1h`), and many other options.
For more details, refer to the [Reference](../reference/cli/index.md#dstack-run).

Once the dev environment is provisioned, click the link to open the environment in your desktop IDE.
Expand Down
43 changes: 32 additions & 11 deletions docs/docs/concepts/services.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Services

Services make it easy to deploy models and apps as public endpoints, allowing you to use any
Services make it easy to deploy models and apps as public endpoints, while giving you the flexibility to use any
frameworks.

??? info "Prerequisites"
Expand Down Expand Up @@ -53,6 +53,10 @@ env:
port: 80
commands:
- text-generation-launcher --port 80 --trust-remote-code

# (Optional) Configure `gpu`, `memory`, `disk`, etc
resources:
gpu: 80GB
```
</div>
Expand Down Expand Up @@ -84,6 +88,11 @@ port: 80
commands:
- text-generation-launcher --port 80 --trust-remote-code
# (Optional) Configure `gpu`, `memory`, `disk`, etc
resources:
gpu: 80GB

# (Optional) Enable the OpenAI-compatible endpoint
model:
type: chat
name: mistralai/Mistral-7B-Instruct-v0.1
Expand All @@ -95,7 +104,10 @@ model:
In this case, with such a configuration, once the service is up, you'll be able to access the model at
`https://gateway.<gateway domain>` via the OpenAI-compatible interface.

#### Chat template
The `format` supports only `tgi` (Text Generation Inference)
and `openai` (if you are using Text Generation Inference or vLLM with OpenAI-compatible mode).

##### Chat template

By default, `dstack` loads the [chat template](https://huggingface.co/docs/transformers/main/en/chat_templating)
from the model's repository. If it is not present there, manual configuration is required.
Expand All @@ -110,6 +122,11 @@ port: 80
commands:
- text-generation-launcher --port 80 --trust-remote-code --quantize gptq
# (Optional) Configure `gpu`, `memory`, `disk`, etc
resources:
gpu: 80GB

# (Optional) Enable the OpenAI-compatible endpoint
model:
type: chat
name: TheBloke/Llama-2-13B-chat-GPTQ
Expand All @@ -123,8 +140,7 @@ model:
1. Doesn't work if your `chat_template` uses `bos_token`. As a workaround, replace `bos_token` inside `chat_template` with the token content itself.
2. Doesn't work if `eos_token` is defined in the model repository as a dictionary. As a workaround, set `eos_token` manually, as shown in the example above (see Chat template).
3. Only works if you're using Text Generation Inference. Support for vLLM and other serving frameworks is coming later.


If you encounter any other issues, please make sure to file a [GitHub issue](https://github.com/dstackai/dstack/issues/new/choose).

## Run the configuration
Expand All @@ -135,7 +151,7 @@ configuration file path, and any other options (e.g., for requesting hardware re
<div class="termy">

```shell
$ dstack run . -f serve.dstack.yml --gpu A100
$ dstack run . -f serve.dstack.yml
BACKEND REGION RESOURCES SPOT PRICE
tensordock unitedkingdom 10xCPU, 80GB, 1xA100 (80GB) no $1.595
Expand All @@ -153,22 +169,27 @@ Service is published at https://yellow-cat-1.example.com
</div>

!!! info "Run options"
The `dstack run` command allows you to use `--gpu` to request GPUs (e.g. `--gpu A100` or `--gpu 80GB` or `--gpu A100:4`, etc.),
and many other options (incl. spot instances, disk size, max price, max duration, retry policy, etc.).
The `dstack run` command allows you to use specify the spot policy (e.g. `--spot-auto`, `--spot`, or `--on-demand`),
max duration of the run (e.g. `--max-duration 1h`), and many other options.
For more details, refer to the [Reference](../reference/cli/index.md#dstack-run).

### Service endpoint

Once the service is up, you'll be able to
access it at `https://<run name>.<gateway domain>`.
Once the service is up, you'll be able to access it at `https://<run name>.<gateway domain>`.

#### Authentication

By default, the service endpoint requires the `Authentication` header with `"Bearer <dstack token>"`.
Authentication can be disabled by setting `auth` to `false` in the service configuration file.

<div class="termy">

```shell
$ curl https://yellow-cat-1.example.com/generate \
-X POST \
-d '{"inputs":"&lt;s&gt;[INST] What is your favourite condiment?[/INST]"}' \
-H 'Content-Type: application/json'
-H 'Content-Type: application/json' \
-H 'Authentication: "Bearer &lt;dstack token&gt;"'
```

</div>
Expand All @@ -184,7 +205,7 @@ from openai import OpenAI
client = OpenAI(
base_url="https://gateway.example.com",
api_key="none"
api_key="<dstack token>"
)
completion = client.chat.completions.create(
Expand Down
Loading

0 comments on commit 9e0f417

Please sign in to comment.