You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If the service defines the [`model`](#model) property, the model can be accessed with
94
-
the global OpenAI-compatible endpoint at `<dstack server URL>/proxy/models/<project name>/`,
95
-
or via `dstack` UI.
93
+
<!-- If [authorization](#authorization) is not disabled, the service endpoint requires the `Authorization` header with `Bearer <dstack token>`. -->
96
94
97
-
If [authorization](#authorization) is not disabled, the service endpoint requires the `Authorization` header with
98
-
`Bearer <dstack token>`.
95
+
## Configuration options
99
96
100
-
??? info "Gateway"
101
-
Running services for development purposes doesn’t require setting up a [gateway](gateways.md).
97
+
<!-- !!! info "No commands"
98
+
If `commands` are not specified, `dstack` runs `image`’s entrypoint (or fails if none is set). -->
102
99
103
-
However, you'll need a gateway in the following cases:
100
+
### Gateway
104
101
105
-
* To use auto-scaling or rate limits
106
-
* To enable a support custom router, e.g. such as the [SGLang Model Gateway](https://docs.sglang.ai/advanced_features/router.html#)
107
-
* To enable HTTPS for the endpoint and map it to your domain
108
-
* If your service requires WebSockets
109
-
* If your service cannot work with a [path prefix](#path-prefix)
102
+
Here are cases where a service may need a gateway:
110
103
111
-
<!-- Note, if you're using [dstack Sky](https://sky.dstack.ai),
112
-
a gateway is already pre-configured for you. -->
104
+
* To use [auto-scaling](#replicas-and-scaling) or [rate limits](#rate-limits)
105
+
* To enable a support custom router, e.g. such as the [SGLang Model Gateway](https://docs.sglang.ai/advanced_features/router.html#)
106
+
* To enable HTTPS for the endpoint and map it to your domain
107
+
* If your service requires WebSockets
108
+
* If your service cannot work with a [path prefix](#path-prefix)
113
109
114
-
If a [gateway](gateways.md) is configured, the service endpoint will be accessible at
115
-
`https://<run name>.<gateway domain>/`.
110
+
<!-- Note, if you're using [dstack Sky](https://sky.dstack.ai),
111
+
a gateway is already pre-configured for you. -->
116
112
117
-
If the service defines the `model` property, the model will be available via the global OpenAI-compatible endpoint
118
-
at `https://gateway.<gateway domain>/`.
113
+
If you want `dstack` to explicitly validate that a gateway is used, you can set the [`gateway`](../reference/dstack.yml/service.md#gateway) property in the service configuration to `true`. In this case, `dstack` will raise an error during `dstack apply` if a default gateway is not created.
119
114
120
-
## Configuration options
115
+
You can also set the `gateway` property to the name of a specific gateway, if required.
116
+
117
+
If you have a [gateway](gateways.md) created, the service endpoint will be accessible at `https://<run name>.<gateway domain>/`:
"content": "Compose a poem that explains the concept of recursion in programming."
131
+
}
132
+
]
133
+
}'
134
+
```
121
135
122
-
!!! info "No commands"
123
-
If `commands` are not specified, `dstack` runs `image`’s entrypoint (or fails if none is set).
136
+
</div>
124
137
125
138
### Replicas and scaling
126
139
@@ -215,12 +228,6 @@ Setting the minimum number of replicas to `0` allows the service to scale down t
215
228
??? info "Disaggregated serving"
216
229
Native support for disaggregated prefill and decode, allowing both worker types to run within a single service, is coming soon.
217
230
218
-
### Model
219
-
220
-
If the service is running a chat model with an OpenAI-compatible interface,
221
-
set the [`model`](#model) property to make the model accessible via `dstack`'s
222
-
global OpenAI-compatible endpoint, and also accessible via `dstack`'s UI.
223
-
224
231
### Authorization
225
232
226
233
By default, the service enables authorization, meaning the service endpoint requires a `dstack` user token.
@@ -359,7 +366,7 @@ set [`strip_prefix`](../reference/dstack.yml/service.md#strip_prefix) to `false`
359
366
If your app cannot be configured to work with a path prefix, you can host it
360
367
on a dedicated domain name by setting up a [gateway](gateways.md).
361
368
362
-
### Rate limits { #rate-limits }
369
+
### Rate limits
363
370
364
371
If you have a [gateway](gateways.md), you can configure rate limits for your service
365
372
using the [`rate_limits`](../reference/dstack.yml/service.md#rate_limits) property.
@@ -408,6 +415,11 @@ Limits apply to the whole service (all replicas) and per client (by IP). Clients
408
415
409
416
</div>
410
417
418
+
### Model
419
+
420
+
If the service runs a model with an OpenAI-compatible interface, you can set the [`model`](#model) property to make the model accessible through `dstack`'s chat UI on the `Models` page.
421
+
In this case, `dstack` will use the service's `/v1/chat/completions` service.
422
+
411
423
### Resources
412
424
413
425
If you specify memory size, you can either specify an explicit size (e.g. `24GB`) or a
When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the OpenAI-compatible endpoint
110
-
is available at `https://gateway.<gateway domain>/`.
108
+
When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the service endpoint will be available at `https://serve-distill-deepseek.<gateway domain>/`.
If you'd like to use a custom routing policy, e.g. by leveraging the [SGLang Model Gateway](https://docs.sglang.ai/advanced_features/router.html#), create a gateway with `router` set to `sglang`. Check out [gateways](https://dstack.ai/docs/concepts/gateways#router) for more details.
109
108
110
-
> If a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured (e.g. to enable auto-scaling or HTTPs, rate-limits, etc), the OpenAI-compatible endpoint is available at `https://gateway.<gateway domain>/`.
109
+
> If a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured (e.g. to enable auto-scaling or HTTPs, rate-limits, etc), the service endpoint will be available at `https://deepseek-r1.<gateway domain>/`.
When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the OpenAI-compatible endpoint
114
-
is available at `https://gateway.<gateway domain>/`.
112
+
When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the service endpoint will be available at `https://llama4-scout.<gateway domain>/`.
When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the OpenAI-compatible endpoint
363
-
is available at `https://gateway.<gateway domain>/`.
361
+
When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the service endpoint will be available at `https://serve-distill.<gateway domain>/`.
When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the OpenAI-compatible endpoint
110
-
is available at `https://gateway.<gateway domain>/`.
108
+
When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the service endpoint will be available at `https://llama31.<gateway domain>/`.
When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the OpenAI-compatible endpoint
294
-
is available at `https://gateway.<gateway domain>/`.
292
+
When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the service endpoint will be available at `https://deepseek-r1.<gateway domain>/`.
When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the OpenAI-compatible endpoint
211
-
is available at `https://gateway.<gateway domain>/`.
209
+
When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the service endpoint will be available at `https://llama31.<gateway domain>/`.
0 commit comments