The guide below explains the steps required to extend dstack with support for a new cloud provider.
The gpuhunt project is a utility that dstack uses to collect information
about cloud providers, their supported machine configurations, pricing, etc. This information is later used by dstack
for provisioning machines.
Thus, in order to support a new cloud provider with dstack, you first need to add the cloud provider to gpuhunt.
To add a new cloud provider to gpuhunt, follow these steps:
git clone https://github.com/dstackai/gpuhunt.gitCreate the provider class file under src/gpuhunt/providers.
Ensure your class...
- Extends the
AbstractProviderbase class. - Has the
NAMEproperty, that will be used as the unique identifier for your provider. - Implements the
getmethod, that is responsible for fetching the available machine configurations from the cloud provider.
Refer to examples: datacrunch.py, aws.py, gcp.py, azure.py, lambdalabs.py, tensordock.py, vastai.py.
Update the src/gpuhunt/_internal/catalog.py file by adding the provider name
to either OFFLINE_PROVIDERS or ONLINE_PROVIDERS depending on the type of the provider.
How do I decide which type my provider is?
OFFLINE_PROVIDERS- Use this type if your provider offers static machine configurations that may be collected and published on a daily basis. Examples:aws,gcp,azure, etc. These providers offer many machine configurations, but they are not updated frequently.ONLINE_PROVIDERS- Use this type if your provider offers dynamic machine configurations that are available at the very moment when you fetch configurations (e.g., GPU marketplaces). Examples:tensordock,vast, etc.
If the provider is registered via OFFLINE_PROVIDERS, you can add data quality tests
under src/integrity_tests/.
Refer to examples: test_datacrunch.py, test_gcp.py.
Anything unclear? Ask questions on the Discord server.
Once the cloud provider is added, submit a pull request.
Once the provider is added to gpuhunt, we can proceed with implementing
the corresponding backend with dstack. Follow the steps below.
git clone https://github.com/dstackai/dstack.gitFollow DEVELOPMENT.md`.
Add any dependencies required by your cloud provider to setup.py. Create a separate section with the provider's name for
these dependencies, and ensure that you update the all section to include them as well.
Add a new enumeration member for your provider to BackendType (src/dstack/_internal/core/models/backends/base.py).
Use the name of the provider.
Create a new directory under src/dstack/_internal/core/backends with the name of the backend type.
Under the backend directory you've created, create the __init__.py file and define the
backend class there (should extend dstack._internal.core.backends.base.Backend).
Refer to examples: datacrunch, aws, gcp.py, azure, etc.
Under the backend directory you've created, create the compute.py file and define the
backend compute class there (should extend dstack._internal.core.backends.base.compute.Compute).
You'll have to implement get_offers, create_instance, run_job and terminate_instance.
The create_instance method is required for the pool feature. If you implement the create_instance method, you should add the provider name to BACKENDS_WITH_CREATE_INSTANCE_SUPPORT. (src/dstack/_internal/server/services/runs.py).
Refer to examples: datacrunch, aws, gcp.py, azure, etc.
Under the src/dstack/_internal/core/models/backends directory, create the file with the name of the backend, and define the
backend config model classes there.
Refer to examples: datacrunch, aws, gcp.py, azure, etc.
Under the backend directory you've created, create the config.py file and define the
backend config class there (should extend dstack._internal.core.backends.base.config.BackendConfig
and the backend configuration model class defined above).
Refer to examples: datacrunch, aws, gcp.py, azure, etc.
Ensure the config model classes are imported
into src/dstack/_internal/core/models/backends/__init__.py.
Create the file with the backend name under src/dstack/_internal/server/services/backends/configurators(https://github.com/dstackai/dstack/blob/master/src/dstack/_internal/server/services/backends/configurators)
and define the backend configurator class (must extend dstack._internal.server.services.backends.configurators.base.Configurator).
Refer to examples: datacrunch, aws, gcp.py, azure, etc.
In src/dstack/_internal/server/services/config.py,
define the corresponding server config class (that represents the ~/.dstack/server/config.yml file),
and add it to AnyBackendConfig (in the same file).
In src/dstack/_internal/server/services/backends/__init__.py,
add the try/except block that imports the backend configurator and appends it to _CONFIGURATOR_CLASSES.
dstack supports two types of backend compute:
- VM-based
- Container-based
It's when the cloud provider allows provisioning Virtual machines (VMs). This is the most flexible backend compute type.
To support it, dstack expects the following from the cloud provider:
- An API for creating and terminating VMs
- Ubuntu 22.04 LTS
- NVIDIA CUDA driver 535
- Docker with NVIDIA runtime
- OpenSSH server
- Cloud-init script (preferred)
- An external IP and public port for SSH
When dstack provisions a VM, it launches there dstack-shim.
The examples of VM-based backends include: aws, azure, gcp, lambda, datacrunch, tensordock, etc.
It's when the cloud provider allows provisioning only containers. This is the most limited backend compute type.
To support it, dstack expects the following from the cloud provider:
- An API for creating and terminating containers
- Docker with NVIDIA runtime
- An external IP and a public port for SSH
- A way to override the container entrypoint (at least ~2KB)
The examples of container-based backends include: kubernetes, vastai, etc.
Note: There are two types of compute in dstack:
When dstack provisions a VM, it launches there dstack-runner.