Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 35 additions & 37 deletions contributing/BACKENDS.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,69 +84,67 @@ See the Appendix at the end of this document and make sure the provider meets th

#### 2.2. Set up the development environment

Follow [DEVELOPMENT.md](DEVELOPMENT.md)`.
Follow [DEVELOPMENT.md](DEVELOPMENT.md).

#### 2.3. Add dependencies to setup.py

Add any dependencies required by your cloud provider to `setup.py`. Create a separate section with the provider's name for these dependencies, and ensure that you update the `all` section to include them as well.

#### 2.4. Implement the provider backend
#### 2.4. Add a new backend type

##### 2.4.1. Define the backend type
Add a new enumeration member for your provider to `BackendType` ([`src/dstack/_internal/core/models/backends/base.py`](https://github.com/dstackai/dstack/blob/master/src/dstack/_internal/core/models/backends/base.py)).

Add a new enumeration member for your provider to `BackendType` (`src/dstack/_internal/core/models/backends/base.py`).
Use the name of the provider.
#### 2.5. Create backend files and classes

##### 2.4.2. Create the backend directory
`dstack` provides a helper script to generate all the necessary files and classes for a new backend.
To add a new backend named `ExampleXYZ`, you should run:

Create a new directory under `src/dstack/_internal/core/backends` with the name of the backend type.

##### 2.4.3. Create the backend class

Under the backend directory you've created, create the `backend.py` file and define the
backend class there (should extend `dstack._internal.core.backends.base.Backend`).

Refer to examples:
[datacrunch](https://github.com/dstackai/dstack/blob/master/src/dstack/_internal/core/backends/datacrunch/backend.py),
[aws](https://github.com/dstackai/dstack/blob/master/src/dstack/_internal/core/backends/aws/backend.py),
[gcp](https://github.com/dstackai/dstack/blob/master/src/dstack/_internal/core/backends/gcp/backend.py),
[azure](https://github.com/dstackai/dstack/blob/master/src/dstack/_internal/core/backends/azure/backend.py), etc.
```shell
python scripts/add_backend.py -n ExampleXYZ
```

##### 2.4.4. Create the backend compute class
It will create an `examplexyz` backend directory under `src/dstack/_internal/core/backends` with the following files:

Under the backend directory you've created, create the `compute.py` file and define the
backend compute class that extends the `dstack._internal.core.backends.base.compute.Compute` class.
It can also extend and implement `ComputeWith*` classes to support additional features such as fleets, volumes, gateways, placement groups, etc. For example, it should extend `ComputeWithCreateInstanceSupport` to support fleets.
* `backend.py` with the `Backend` class implementation. You typically don't need to modify it.
* `compute.py` with the `Compute` class implementation. This is the core of the backend that you need to implement.
* `configurator.py` with the `Configurator` class implementation. It deals with validating and storing backend config. You need to adjust it with custom backend config validation.
* `models.py` with all the backend config models used by `Backend`, `Compute`, `Configurator` and other parts of `dstack`.

Refer to examples:
[datacrunch](https://github.com/dstackai/dstack/blob/master/src/dstack/_internal/core/backends/datacrunch/compute.py),
[aws](https://github.com/dstackai/dstack/blob/master/src/dstack/_internal/core/backends/aws/compute.py),
[gcp](https://github.com/dstackai/dstack/blob/master/src/dstack/_internal/core/backends/gcp/compute.py),
[azure](https://github.com/dstackai/dstack/blob/master/src/dstack/_internal/core/backends/azure/compute.py), etc.
##### 2.6. Adjust and register the backend config models

##### 2.4.5. Create and register the backend config models

Under the backend directory, create the `models.py` file and define the backend config model classes there.
Every backend must define at least two models:
Go to `models.py`. It'll contain two config models required for all backends:

* `*BackendConfig` that contains all backend parameters available for user configuration except for creds.
* `*BackendConfigWithCreds` that contains all backends parameters available for user configuration and also creds.

These models are used in server/config.yaml, the API, and for backend configuration.
Adjust generated config models by adding additional config parameters.
Typically you'd need to only modify the `*BackendConfig` model since other models extend it.

The models should be added to `AnyBackendConfig*` unions in [`src/dstack/_internal/core/backends/models.py`](https://github.com/dstackai/dstack/blob/master/src/dstack/_internal/core/backends/models.py).
Then add these models to `AnyBackendConfig*` unions in [`src/dstack/_internal/core/backends/models.py`](https://github.com/dstackai/dstack/blob/master/src/dstack/_internal/core/backends/models.py).

It's not required but recommended to also define `*BackendStoredConfig` that extends `*BackendConfig` to be able to store extra parameters in the DB. By the same logic, it's recommended to define `*Config` that extends `*BackendStoredConfig` with creds and use it as the main `Backend` and `Compute` config instead of using `*BackendConfigWithCreds` directly.
The script also generates `*BackendStoredConfig` that extends `*BackendConfig` to be able to store extra parameters in the DB. By the same logic, it generates `*Config` that extends `*BackendStoredConfig` with creds and uses it as the main `Backend` and `Compute` config instead of using `*BackendConfigWithCreds` directly.

Refer to examples:
[datacrunch](https://github.com/dstackai/dstack/blob/master/src/dstack/_internal/core/backends/datacrunch/models.py),
[aws](https://github.com/dstackai/dstack/blob/master/src/dstack/_internal/core/backends/aws/models.py),
[gcp](https://github.com/dstackai/dstack/blob/master/src/dstack/_internal/core/backends/gcp/models.py),
[azure](https://github.com/dstackai/dstack/blob/master/src/dstack/_internal/core/backends/models.py), etc.

##### 2.4.6. Create and register the configurator class
##### 2.7. Implement the backend compute class

Go to `compute.py` and implement `Compute` methods.
Optionally, extend and implement `ComputeWith*` classes to support additional features such as fleets, volumes, gateways, placement groups, etc. For example, extend `ComputeWithCreateInstanceSupport` to support fleets.

Refer to examples:
[datacrunch](https://github.com/dstackai/dstack/blob/master/src/dstack/_internal/core/backends/datacrunch/compute.py),
[aws](https://github.com/dstackai/dstack/blob/master/src/dstack/_internal/core/backends/aws/compute.py),
[gcp](https://github.com/dstackai/dstack/blob/master/src/dstack/_internal/core/backends/gcp/compute.py),
[azure](https://github.com/dstackai/dstack/blob/master/src/dstack/_internal/core/backends/azure/compute.py), etc.

##### 2.8. Implement and register the configurator class

Under the backend directory, create the `configurator.py` file and and define the backend configurator class (must extend `dstack._internal.core.backends.base.configurator.Configurator`).
Go to `configurator.py` and implement custom `Configurator` logic. At minimum, you should implement creds validation.
You may also need to validate other config parameters if there are any.

Refer to examples: [datacrunch](https://github.com/dstackai/dstack/blob/master/src/dstack/_internal/core/backends/datacrunch/configurator.py),
[aws](https://github.com/dstackai/dstack/blob/master/src/dstack/_internal/core/backends/aws/configurator.py),
Expand All @@ -155,7 +153,7 @@ Refer to examples: [datacrunch](https://github.com/dstackai/dstack/blob/master/s

Register configurator by appending it to `_CONFIGURATOR_CLASSES` in [`src/dstack/_internal/core/backends/configurators.py`](https://github.com/dstackai/dstack/blob/master/src/dstack/_internal/core/backends/configurators.py).

##### 2.4.7. (Optional) Override provisioning timeout
##### 2.9. (Optional) Override provisioning timeout

If instances in the backend take more than 10 minutes to start, override the default provisioning timeout in
[`src/dstack/_internal/server/background/tasks/common.py`](https://github.com/dstackai/dstack/blob/master/src/dstack/_internal/server/background/tasks/common.py).
Expand Down
46 changes: 46 additions & 0 deletions scripts/add_backend.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
import argparse
from pathlib import Path

import jinja2


def main():
parser = argparse.ArgumentParser(
description="This script generates boilerplate code for a new backend"
)
parser.add_argument(
"-n",
"--name",
help=(
"The backend name in CamelCase, e.g. AWS, Runpod, VastAI."
" It'll be used for naming backend classes, models, etc."
),
required=True,
)
args = parser.parse_args()
generate_backend_code(args.name)


def generate_backend_code(backend_name: str):
template_dir_path = Path(__file__).parent.parent.joinpath(
"src/dstack/_internal/core/backends/template"
)
env = jinja2.Environment(
loader=jinja2.FileSystemLoader(
searchpath=template_dir_path,
),
keep_trailing_newline=True,
)
backend_dir_path = Path(__file__).parent.parent.joinpath(
f"src/dstack/_internal/core/backends/{backend_name.lower()}"
)
backend_dir_path.mkdir(exist_ok=True)
for filename in ["backend.py", "compute.py", "configurator.py", "models.py"]:
template = env.get_template(f"{filename}.jinja")
with open(backend_dir_path.joinpath(filename), "w+") as f:
f.write(template.render({"backend_name": backend_name}))
backend_dir_path.joinpath("__init__.py").write_text("")


if __name__ == "__main__":
main()
5 changes: 5 additions & 0 deletions src/dstack/_internal/core/backends/base/compute.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,11 @@ def __init__(self):
def get_offers(
self, requirements: Optional[Requirements] = None
) -> List[InstanceOfferWithAvailability]:
"""
Returns offers with availability matching `requirements`.
If the provider is added to gpuhunt, typically gets offers using `base.offers.get_catalog_offers()`
and extends them with availability info.
"""
pass

@abstractmethod
Expand Down
Empty file.
16 changes: 16 additions & 0 deletions src/dstack/_internal/core/backends/template/backend.py.jinja
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
from dstack._internal.core.backends.base.backend import Backend
from dstack._internal.core.backends.{{ backend_name|lower }}.compute import {{ backend_name }}Compute
from dstack._internal.core.backends.{{ backend_name|lower }}.models import {{ backend_name }}Config
from dstack._internal.core.models.backends.base import BackendType


class {{ backend_name }}Backend(Backend):
TYPE = BackendType.{{ backend_name|upper }}
COMPUTE_CLASS = {{ backend_name }}Compute

def __init__(self, config: {{ backend_name }}Config):
self.config = config
self._compute = {{ backend_name }}Compute(self.config)

def compute(self) -> {{ backend_name }}Compute:
return self._compute
87 changes: 87 additions & 0 deletions src/dstack/_internal/core/backends/template/compute.py.jinja
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
from typing import List, Optional

from dstack._internal.core.backends.base.backend import Compute
from dstack._internal.core.backends.base.compute import (
ComputeWithCreateInstanceSupport,
ComputeWithGatewaySupport,
ComputeWithMultinodeSupport,
ComputeWithPlacementGroupSupport,
ComputeWithPrivateGatewaySupport,
ComputeWithReservationSupport,
ComputeWithVolumeSupport,
)
from dstack._internal.core.backends.base.offers import get_catalog_offers
from dstack._internal.core.backends.{{ backend_name|lower }}.models import {{ backend_name }}Config
from dstack._internal.core.models.backends.base import BackendType
from dstack._internal.core.models.instances import (
InstanceAvailability,
InstanceConfiguration,
InstanceOfferWithAvailability,
)
from dstack._internal.core.models.runs import Job, JobProvisioningData, Requirements, Run
from dstack._internal.core.models.volumes import Volume
from dstack._internal.utils.logging import get_logger

logger = get_logger(__name__)


class {{ backend_name }}Compute(
# TODO: Choose ComputeWith* classes to extend and implement
# ComputeWithCreateInstanceSupport,
# ComputeWithMultinodeSupport,
# ComputeWithReservationSupport,
# ComputeWithPlacementGroupSupport,
# ComputeWithGatewaySupport,
# ComputeWithPrivateGatewaySupport,
# ComputeWithVolumeSupport,
Compute,
):
def __init__(self, config: {{ backend_name }}Config):
super().__init__()
self.config = config

def get_offers(
self, requirements: Optional[Requirements] = None
) -> List[InstanceOfferWithAvailability]:
# If the provider is added to gpuhunt, you'd typically get offers
# using `get_catalog_offers()` and extend them with availability info.
Comment on lines +46 to +47
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) Consider adding a sample call to get_catalog_offers, as contributors can forget to pass important arguments, such as locations or configurable_disk_size.

offers = get_catalog_offers(
    backend=BackendType.{{ backend_name|upper }},
    locations=self.config.regions or None,
    requirements=requirements,
    configurable_disk_size=...,  # TODO: set in case of boot volume size limits
)

offers = get_catalog_offers(
backend=BackendType.{{ backend_name|upper }},
locations=self.config.regions or None,
requirements=requirements,
# configurable_disk_size=..., TODO: set in case of boot volume size limits
)
# TODO: Add availability info to offers
return [
InstanceOfferWithAvailability(
**offer.dict(),
availability=InstanceAvailability.UNKNOWN,
)
for offer in offers
]

def create_instance(
self,
instance_offer: InstanceOfferWithAvailability,
instance_config: InstanceConfiguration,
) -> JobProvisioningData:
# TODO: Implement if backend supports creating instances (VM-based).
# Delete if backend can only run jobs (container-based).
raise NotImplementedError()

def run_job(
self,
run: Run,
job: Job,
instance_offer: InstanceOfferWithAvailability,
project_ssh_public_key: str,
project_ssh_private_key: str,
volumes: List[Volume],
) -> JobProvisioningData:
# TODO: Implement if create_instance() is not implemented. Delete otherwise.
raise NotImplementedError()

def terminate_instance(
self, instance_id: str, region: str, backend_data: Optional[str] = None
):
raise NotImplementedError()
70 changes: 70 additions & 0 deletions src/dstack/_internal/core/backends/template/configurator.py.jinja
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
import json

from dstack._internal.core.backends.base.configurator import (
BackendRecord,
Configurator,
raise_invalid_credentials_error,
)
from dstack._internal.core.backends.{{ backend_name|lower }}.backend import {{ backend_name }}Backend
from dstack._internal.core.backends.{{ backend_name|lower }}.models import (
Any{{ backend_name }}BackendConfig,
Any{{ backend_name }}Creds,
{{ backend_name }}BackendConfig,
{{ backend_name }}BackendConfigWithCreds,
{{ backend_name }}Config,
{{ backend_name }}Creds,
{{ backend_name }}StoredConfig,
)
from dstack._internal.core.models.backends.base import (
BackendType,
)

# TODO: Add all supported regions and default regions
REGIONS = []
Comment on lines +22 to +23
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded regions are not needed for most backends, so I wouldn't include this in the template.

I guess they used to be needed for interactive setup, but now they are only needed for backends with custom-built VM images that are not available in all regions. For other backends, hardcoded regions are rather harmful, as they prevent users from using newly added regions

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regions don't need to be hardcoded but they need to be validated, and hardcoding them seems to be the best option for most GPU clouds (have a few regions, don't add new regions often, unlikely to have an API to get all the regions).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, feel free to adjust the template on regions.



class {{ backend_name }}Configurator(Configurator):
TYPE = BackendType.{{ backend_name|upper }}
BACKEND_CLASS = {{ backend_name }}Backend

def validate_config(
self, config: {{ backend_name }}BackendConfigWithCreds, default_creds_enabled: bool
):
self._validate_creds(config.creds)
# TODO: Validate additional config parameters if any

def create_backend(
self, project_name: str, config: {{ backend_name }}BackendConfigWithCreds
) -> BackendRecord:
if config.regions is None:
config.regions = REGIONS
return BackendRecord(
config={{ backend_name }}StoredConfig(
**{{ backend_name }}BackendConfig.__response__.parse_obj(config).dict()
).json(),
auth={{ backend_name }}Creds.parse_obj(config.creds).json(),
)

def get_backend_config(
self, record: BackendRecord, include_creds: bool
) -> Any{{ backend_name }}BackendConfig:
config = self._get_config(record)
if include_creds:
return {{ backend_name }}BackendConfigWithCreds.__response__.parse_obj(config)
return {{ backend_name }}BackendConfig.__response__.parse_obj(config)

def get_backend(self, record: BackendRecord) -> {{ backend_name }}Backend:
config = self._get_config(record)
return {{ backend_name }}Backend(config=config)

def _get_config(self, record: BackendRecord) -> {{ backend_name }}Config:
return {{ backend_name }}Config.__response__(
**json.loads(record.config),
creds={{ backend_name }}Creds.parse_raw(record.auth),
)

def _validate_creds(self, creds: Any{{ backend_name }}Creds):
# TODO: Implement API key or other creds validation
# if valid:
# return
raise_invalid_credentials_error(fields=[["creds", "api_key"]])
Loading