0.15.0
Resources
It is now possible to configure resources in the YAML configuration file:
type: dev-environment
python: 3.11
ide: vscode
# (Optional) Configure `gpu`, `memory`, `disk`, etc
resources:
gpu: 24GB
Supported properties include: gpu
, cpu
, memory
, disk
, and shm_size
.
If you specify memory size, you can either specify an explicit size (e.g. 24GB
) or a
range (e.g. 24GB..
, or 24GB..80GB
, or ..80GB
).
The gpu
property allows specifying not only memory size but also GPU names
and their quantity. Examples: A100
(one A100), A10G,A100
(either A10G or A100),
A100:80GB
(one A100 of 80GB), A100:2
(two A100), 24GB..40GB:2
(two GPUs between 24GB and 40GB), etc.
Authorization in services
Service endpoints now require the Authentication
header with "Bearer <dstack token>"
. This also includes the OpenAI-compatible endpoints.
from openai import OpenAI
client = OpenAI(
base_url="https://gateway.example.com",
api_key="<dstack token>"
)
completion = client.chat.completions.create(
model="mistralai/Mistral-7B-Instruct-v0.1",
messages=[
{"role": "user", "content": "Compose a poem that explains the concept of recursion in programming."}
]
)
print(completion.choices[0].message)
Authentication can be disabled by setting auth
to false
in the service configuration file.
OpenAI format in model mapping
Model mapping (required to enable OpenAI interact) now supports format: openai
.
For example, if you run vLLM using the OpenAI mode, it's possible to configure model mapping for it.
type: service
python: "3.11"
env:
- MODEL=NousResearch/Llama-2-7b-chat-hf
commands:
- pip install vllm
- python -m vllm.entrypoints.openai.api_server --model $MODEL --port 8000
port: 8000
resources:
gpu: 24GB
model:
format: openai
type: chat
name: NousResearch/Llama-2-7b-chat-hf
What's changed
- Configuration resources & ranges by @Egor-S in #844
- Range.str always returns a string by @Egor-S in #845
- Add infinity example by @deep-diver in #847
- error in documentation: use --url instead of --server by @promsoft in #852
- Support authorization on the gateway by @Egor-S in #851
- Implement Kubernetes backend by @r4victor in #853
- Add gpu support for kubernetes by @r4victor in #856
- Resources parse and store by @Egor-S in #857
- Use python3.11 in generate-json-schema by @r4victor in #859
- Implement OpenAI to OpenAI adapter for gateway by @Egor-S in #860
New contributors
- @deep-diver made their first contribution in #847
- @promsoft made their first contribution in #852
Full Changelog: 0.14.0...0.15.0