Skip to content

Releases: dstackai/dstack

0.18.1

29 Apr 15:47
Compare
Choose a tag to compare

On-prem servers

Now you can add your own servers as pool instances:

dstack pool add-ssh -i ~/.ssh/id_rsa [email protected]

Note

The server should be pre-installed with CUDA 12.1 and NVIDIA Docker.

Configuration

All .dstack/profiles.yml properties now can be specified via run configurations:

type: dev-environment

ide: vscode

spot_policy: auto
backends: ["aws"]

regions: ["eu-west-1", "eu-west-2"]

instance_types: ["p3.8xlarge", "p3.16xlarge"]
max_price: 2.0

max_duration: 1d

New examples 🔥🔥

Thanks to the contribution from @deep-diver, we got two new examples:

Other

  • Configuring VPCs using their IDs (via vpc_ids in server/config.yml)
  • Support for global profiles (via ~/.dstack/profiles.yml)
  • Updated the default environment variables (DSTACK_RUN_NAME, DSTACK_GPUS_NUM, DSTACK_NODES_NUM, DSTACK_NODE_RANK, and DSTACK_MASTER_NODE_IP)
  • It’s now possible to use NVIDIA A10 GPU on Azure
  • More granular permissions for Azure

What's changed

Full Changelog: 0.18.0...0.18.1rc2

0.18.0

10 Apr 15:46
Compare
Choose a tag to compare

RunPod

The update adds the long-awaited integration with RunPod, a distributed GPU cloud that offers GPUs at affordable prices.

To use RunPod, specify your RunPod API key in ~/.dstack/server/config.yml:

projects:
- name: main
  backends:
  - type: runpod
    creds:
      type: api_key
      api_key: US9XTPDIV8AR42MMINY8TCKRB8S4E7LNRQ6CAUQ9

Once the server is restarted, go ahead and run workloads.

Clusters

Another major change with the update is the ability to run multi-node tasks over an interconnected cluster of instances.

type: task

nodes: 2

commands:
  - git clone https://github.com/r4victor/pytorch-distributed-resnet.git
  - cd pytorch-distributed-resnet
  - mkdir -p data
  - cd data
  - wget -c --quiet https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
  - tar -xvzf cifar-10-python.tar.gz
  - cd ..
  - pip3 install -r requirements.txt torch
  - mkdir -p saved_models
  - torchrun --nproc_per_node=$DSTACK_GPUS_PER_NODE 
     --node_rank=$DSTACK_NODE_RANK 
     --nnodes=$DSTACK_NODES_NUM
     --master_addr=$DSTACK_MASTER_NODE_IP
     --master_port=8008 resnet_ddp.py 
     --num_epochs 20

resources:
  gpu: 1

Currently supported providers for this feature include AWS, GCP, and Azure.

Other

  • The commands property is now not required for tasks and services if you use an image that has a default entrypoint configured.
  • The permissions required for using dstack with GCP are more granular.

What's changed

Full changelog: 0.17.0...0.18.0

0.17.0

03 Apr 10:20
Compare
Choose a tag to compare

Service auto-scaling

Previously, dstack always served services as single replicas. While this is suitable for development, in production, the service must automatically scale based on the load.

That's why in 0.17.0, we extended dstack with the capability to configure replicas (the number of replicas) as well as scaling (the auto-scaling policy).

Regions and instance types

The update brings support for specifying regions and instance types (in dstack run and .dstack/profiles.yml)

Environment variables

Firstly, it's now possible to configure an environment variable in the configuration without hardcoding its value. Secondly, dstack run now inherits environment variables from the current process.

For more details on these new features, check the changelog.

What's changed

New contributors

Full changelog: 0.16.5...0.17.0

0.16.5

26 Mar 07:59
Compare
Choose a tag to compare

Bug-fixes

  • Docker pull related issues #1025

Full changelog: 0.16.4...0.16.5

0.16.4

18 Mar 13:25
Compare
Choose a tag to compare

CUDO Compute

The 0.16.4 update introduces the cudo backend, which allows running workloads with CUDO Compute, a cloud GPU marketplace.

To configure the cudo backend, you simply need to specify your CUDO Compute project ID and API key:

projects:
- name: main
  backends:
  - type: cudo
    project_id: my-cudo-project
    creds:
      type: api_key
      api_key: 7487240a466624b48de22865589

Once it's done, you can restart the dstack server and use the dstack CLI or API to run workloads.

Note

Limitations

  • The dstack gateway feature is not yet compatible with cudo, but it is expected to be supported in version 0.17.0,
    planned for release within a week.
  • The cudo backend cannot yet be used with dstack Sky, but it will also be enabled within a week.

Full changelog: 0.16.3...0.16.4

0.16.3

13 Mar 15:19
Compare
Choose a tag to compare

Bug-fixes

  • [Bug] The shm_size property in resources doesn't take effect #1006
  • [Bug]: It's not possible to configure projects other than main via ~/.dstack/server/config.yml #991
  • [Bug] Spot instances don't work on GCP if the username has upper case letters #975

Full changelog: 0.16.2...0.16.3

0.16.1

05 Mar 12:22
Compare
Choose a tag to compare

Improvements to dstack pool

  • Change default idle duration for dstack pool add to 72h #964
  • Set the default spot policy in dstack pool add to on-demand #962
  • Add pool support for lambda, azure, and tensordock #923
  • Allow to pass idle duration and spot policy in dstack pool add #918
  • dstack run does not respect pool-related profiles.yml parameters #949

Bug-fixes

  • Runs submitted via Python API have no termination policy #955
  • The vastai backend doesn't show any offers since 0.16.0 #958
  • Handle permission error when adding Include to ~/.ssh/config #937
  • The SSH tunnel fails because of a messy ~/.ssh/config #933
  • The PATH is overridden when logging via SSH #930
  • The SSH tunnel fails with Too many authentication failures #927

We've also updated our guide on how to add new backends. It's now available here.

New contributors

Full Changelog: 0.16.0...0.16.1

0.16.0

26 Feb 15:35
Compare
Choose a tag to compare

Pools

The 0.16.0 release is the next major update, which, in addition to many bug fixes, introduces pools, a major new feature that enables a more efficient way to manage instance lifecycles and reuse instances across runs.

dstack run

Previously, when running a dev environment, task, or service, dstack provisioned an instance in a configured
backend, and upon completion of the run, deleted the instance.

Now, when using the dstack run command, it tries to reuse an instance from a pool. If no ready instance meets the
requirements, dstack automatically provisions a new one and adds it to the pool.

Once the workload finishes, the instance is marked as idle.
If the instance remains idle for the configured duration, dstack tears it down.

dstack pool

The dstack pool command allows for managing instances within pools.

To manually add an instance to a pool, use dstack pool add:

dstack pool add --gpu 80GB --idle-duration 1d

The dstack pool add command allows specifying resource requirements, along with the spot policy, idle duration, max
price, retry policy, and other policies.

If no idle duration is configured, by default, dstack sets it to 72h.
To override it, use the --idle-duration DURATION argument.

To learn more about pools, refer to the official documentation. To learn more about 0.16.0, refer to the changelog.

What's changed

New contributors

Full changelog: 0.15.1...0.16.0

0.15.2rc2

19 Feb 17:03
Compare
Choose a tag to compare
0.15.2rc2 Pre-release
Pre-release

Bug-fixes

  • Exclude private subnets when provisioning in AWS #908
  • Ollama doesn't detect the GPU (requires --gpus==all instead of --runtime=nvidia) #910

Full changelog: 0.15.1...0.15.2rc2

0.15.1

15 Feb 09:48
Compare
Choose a tag to compare

Kubernetes

With the latest update, it's now possible to configure a Kubernetes backend. In this case, if you run a workload, dstack will provision infrastructure within your Kubernetes cluster. This may work with both self-managed and managed clusters.

Specifying a custom VPC for AWS

If you're using dstack with AWS, it's now possible to configure a vpc_name via ~/.dstack/server/config.yml.

** Learn more about the new features in detail on the changelog page.**

What's changed

New contributors

Full Changelog: 0.15.0...0.15.1