Skip to content

Commit 17cbde0

Browse files
[Docs] Remove the mention of the gateway endpoint #3514
1 parent 9c81898 commit 17cbde0

9 files changed

Lines changed: 71 additions & 73 deletions

File tree

docs/blog/posts/dstack-sky.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -121,15 +121,14 @@ model: mixtral
121121
```
122122
</div>
123123
124-
If it has a `model` mapping, the model will be accessible
125-
at `https://gateway.<project name>.sky.dstack.ai` via the OpenAI compatible interface.
124+
The service endpoint will be accessible at `https://<run name>.<project name>.sky.dstack.ai` via the OpenAI compatible interface.
126125

127126
```python
128127
from openai import OpenAI
129128
130129
131130
client = OpenAI(
132-
base_url="https://gateway.<project name>.sky.dstack.ai",
131+
base_url="https://<run name>.<project name>.sky.dstack.ai/v1",
133132
api_key="<dstack token>"
134133
)
135134

docs/docs/concepts/services.md

Lines changed: 42 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ Model meta-llama/Meta-Llama-3.1-8B-Instruct is published at:
6868

6969
`dstack apply` automatically provisions instances and runs the service.
7070

71-
If a [gateway](gateways.md) is not configured, the service’s endpoint will be accessible at
71+
If you do not have a [gateway](gateways.md) created, the service endpoint will be accessible at
7272
`<dstack server URL>/proxy/services/<project name>/<run name>/`.
7373

7474
<div class="termy">
@@ -90,37 +90,50 @@ $ curl http://localhost:3000/proxy/services/main/llama31/v1/chat/completions \
9090

9191
</div>
9292

93-
If the service defines the [`model`](#model) property, the model can be accessed with
94-
the global OpenAI-compatible endpoint at `<dstack server URL>/proxy/models/<project name>/`,
95-
or via `dstack` UI.
93+
<!-- If [authorization](#authorization) is not disabled, the service endpoint requires the `Authorization` header with `Bearer <dstack token>`. -->
9694

97-
If [authorization](#authorization) is not disabled, the service endpoint requires the `Authorization` header with
98-
`Bearer <dstack token>`.
95+
## Configuration options
9996

100-
??? info "Gateway"
101-
Running services for development purposes doesn’t require setting up a [gateway](gateways.md).
97+
<!-- !!! info "No commands"
98+
If `commands` are not specified, `dstack` runs `image`’s entrypoint (or fails if none is set). -->
10299

103-
However, you'll need a gateway in the following cases:
100+
### Gateway
104101

105-
* To use auto-scaling or rate limits
106-
* To enable a support custom router, e.g. such as the [SGLang Model Gateway](https://docs.sglang.ai/advanced_features/router.html#)
107-
* To enable HTTPS for the endpoint and map it to your domain
108-
* If your service requires WebSockets
109-
* If your service cannot work with a [path prefix](#path-prefix)
102+
Here are cases where a service may need a gateway:
110103

111-
<!-- Note, if you're using [dstack Sky](https://sky.dstack.ai),
112-
a gateway is already pre-configured for you. -->
104+
* To use [auto-scaling](#replicas-and-scaling) or [rate limits](#rate-limits)
105+
* To enable a support custom router, e.g. such as the [SGLang Model Gateway](https://docs.sglang.ai/advanced_features/router.html#)
106+
* To enable HTTPS for the endpoint and map it to your domain
107+
* If your service requires WebSockets
108+
* If your service cannot work with a [path prefix](#path-prefix)
113109

114-
If a [gateway](gateways.md) is configured, the service endpoint will be accessible at
115-
`https://<run name>.<gateway domain>/`.
110+
<!-- Note, if you're using [dstack Sky](https://sky.dstack.ai),
111+
a gateway is already pre-configured for you. -->
116112

117-
If the service defines the `model` property, the model will be available via the global OpenAI-compatible endpoint
118-
at `https://gateway.<gateway domain>/`.
113+
If you want `dstack` to explicitly validate that a gateway is used, you can set the [`gateway`](../reference/dstack.yml/service.md#gateway) property in the service configuration to `true`. In this case, `dstack` will raise an error during `dstack apply` if a default gateway is not created.
119114

120-
## Configuration options
115+
You can also set the `gateway` property to the name of a specific gateway, if required.
116+
117+
If you have a [gateway](gateways.md) created, the service endpoint will be accessible at `https://<run name>.<gateway domain>/`:
118+
119+
<div class="termy">
120+
121+
```shell
122+
$ curl https://llama31.example.com/v1/chat/completions \
123+
-H 'Content-Type: application/json' \
124+
-H 'Authorization: Bearer &lt;dstack token&gt;' \
125+
-d '{
126+
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
127+
"messages": [
128+
{
129+
"role": "user",
130+
"content": "Compose a poem that explains the concept of recursion in programming."
131+
}
132+
]
133+
}'
134+
```
121135

122-
!!! info "No commands"
123-
If `commands` are not specified, `dstack` runs `image`’s entrypoint (or fails if none is set).
136+
</div>
124137

125138
### Replicas and scaling
126139

@@ -215,12 +228,6 @@ Setting the minimum number of replicas to `0` allows the service to scale down t
215228
??? info "Disaggregated serving"
216229
Native support for disaggregated prefill and decode, allowing both worker types to run within a single service, is coming soon.
217230

218-
### Model
219-
220-
If the service is running a chat model with an OpenAI-compatible interface,
221-
set the [`model`](#model) property to make the model accessible via `dstack`'s
222-
global OpenAI-compatible endpoint, and also accessible via `dstack`'s UI.
223-
224231
### Authorization
225232

226233
By default, the service enables authorization, meaning the service endpoint requires a `dstack` user token.
@@ -359,7 +366,7 @@ set [`strip_prefix`](../reference/dstack.yml/service.md#strip_prefix) to `false`
359366
If your app cannot be configured to work with a path prefix, you can host it
360367
on a dedicated domain name by setting up a [gateway](gateways.md).
361368

362-
### Rate limits { #rate-limits }
369+
### Rate limits
363370

364371
If you have a [gateway](gateways.md), you can configure rate limits for your service
365372
using the [`rate_limits`](../reference/dstack.yml/service.md#rate_limits) property.
@@ -408,6 +415,11 @@ Limits apply to the whole service (all replicas) and per client (by IP). Clients
408415
409416
</div>
410417
418+
### Model
419+
420+
If the service runs a model with an OpenAI-compatible interface, you can set the [`model`](#model) property to make the model accessible through `dstack`'s chat UI on the `Models` page.
421+
In this case, `dstack` will use the service's `/v1/chat/completions` service.
422+
411423
### Resources
412424

413425
If you specify memory size, you can either specify an explicit size (e.g. `24GB`) or a

examples/inference/nim/README.md

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -78,13 +78,12 @@ Provisioning...
7878
```
7979
</div>
8080

81-
If no gateway is created, the model will be available via the OpenAI-compatible endpoint
82-
at `<dstack server URL>/proxy/models/<project name>/`.
81+
If no gateway is created, the service endpoint will be available at `<dstack server URL>/proxy/services/<project name>/<run name>/`.
8382

8483
<div class="termy">
8584

8685
```shell
87-
$ curl http://127.0.0.1:3000/proxy/models/main/chat/completions \
86+
$ curl http://127.0.0.1:3000/proxy/services/main/serve-distill-deepseek/v1/chat/completions \
8887
-X POST \
8988
-H 'Authorization: Bearer &lt;dstack token&gt;' \
9089
-H 'Content-Type: application/json' \
@@ -106,8 +105,7 @@ $ curl http://127.0.0.1:3000/proxy/models/main/chat/completions \
106105

107106
</div>
108107

109-
When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the OpenAI-compatible endpoint
110-
is available at `https://gateway.<gateway domain>/`.
108+
When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the service endpoint will be available at `https://serve-distill-deepseek.<gateway domain>/`.
111109

112110
## Source code
113111

examples/inference/sglang/README.md

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ Here's an example of a service that deploys DeepSeek-R1-Distill-Llama 8B and 70B
1212

1313
```yaml
1414
type: service
15-
name: deepseek-r1-nvidia
15+
name: deepseek-r1
1616

1717
image: lmsysorg/sglang:latest
1818
env:
@@ -38,7 +38,7 @@ Here's an example of a service that deploys DeepSeek-R1-Distill-Llama 8B and 70B
3838

3939
```yaml
4040
type: service
41-
name: deepseek-r1-amd
41+
name: deepseek-r1
4242

4343
image: lmsysorg/sglang:v0.4.1.post4-rocm620
4444
env:
@@ -69,20 +69,19 @@ $ dstack apply -f examples/llms/deepseek/sglang/amd/.dstack.yml
6969
# BACKEND REGION RESOURCES SPOT PRICE
7070
1 runpod EU-RO-1 24xCPU, 283GB, 1xMI300X (192GB) no $2.49
7171

72-
Submit the run deepseek-r1-amd? [y/n]: y
72+
Submit the run deepseek-r1? [y/n]: y
7373

7474
Provisioning...
7575
---> 100%
7676
```
7777
</div>
7878

79-
Once the service is up, the model will be available via the OpenAI-compatible endpoint
80-
at `<dstack server URL>/proxy/models/<project name>/`.
79+
If no gateway is created, the service endpoint will be available at `<dstack server URL>/proxy/services/<project name>/<run name>/`.
8180

8281
<div class="termy">
8382

8483
```shell
85-
curl http://127.0.0.1:3000/proxy/models/main/chat/completions \
84+
curl http://127.0.0.1:3000/proxy/services/main/deepseek-r1/v1/chat/completions \
8685
-X POST \
8786
-H 'Authorization: Bearer &lt;dstack token&gt;' \
8887
-H 'Content-Type: application/json' \
@@ -107,7 +106,7 @@ curl http://127.0.0.1:3000/proxy/models/main/chat/completions \
107106
!!! info "SGLang Model Gateway"
108107
If you'd like to use a custom routing policy, e.g. by leveraging the [SGLang Model Gateway](https://docs.sglang.ai/advanced_features/router.html#), create a gateway with `router` set to `sglang`. Check out [gateways](https://dstack.ai/docs/concepts/gateways#router) for more details.
109108

110-
> If a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured (e.g. to enable auto-scaling or HTTPs, rate-limits, etc), the OpenAI-compatible endpoint is available at `https://gateway.<gateway domain>/`.
109+
> If a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured (e.g. to enable auto-scaling or HTTPs, rate-limits, etc), the service endpoint will be available at `https://deepseek-r1.<gateway domain>/`.
111110
112111
## Source code
113112

examples/inference/tgi/README.md

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -82,13 +82,12 @@ Provisioning...
8282
```
8383
</div>
8484

85-
If no gateway is created, the model will be available via the OpenAI-compatible endpoint
86-
at `<dstack server URL>/proxy/models/<project name>/`.
85+
If no gateway is created, the service endpoint will be available at `<dstack server URL>/proxy/services/<project name>/<run name>/`.
8786

8887
<div class="termy">
8988

9089
```shell
91-
$ curl http://127.0.0.1:3000/proxy/models/main/chat/completions \
90+
$ curl http://127.0.0.1:3000/proxy/services/main/llama4-scout/v1/chat/completions \
9291
-X POST \
9392
-H 'Authorization: Bearer &lt;dstack token&gt;' \
9493
-H 'Content-Type: application/json' \
@@ -110,8 +109,7 @@ $ curl http://127.0.0.1:3000/proxy/models/main/chat/completions \
110109

111110
</div>
112111

113-
When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the OpenAI-compatible endpoint
114-
is available at `https://gateway.<gateway domain>/`.
112+
When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the service endpoint will be available at `https://llama4-scout.<gateway domain>/`.
115113

116114
## Source code
117115

examples/inference/trtllm/README.md

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -330,13 +330,12 @@ Provisioning...
330330

331331
## Access the endpoint
332332

333-
If no gateway is created, the model will be available via the OpenAI-compatible endpoint
334-
at `<dstack server URL>/proxy/models/<project name>/`.
333+
If no gateway is created, the service endpoint will be available at `<dstack server URL>/proxy/services/<project name>/<run name>/`.
335334

336335
<div class="termy">
337336

338337
```shell
339-
$ curl http://127.0.0.1:3000/proxy/models/main/chat/completions \
338+
$ curl http://127.0.0.1:3000/proxy/services/main/serve-distill/v1/chat/completions \
340339
-X POST \
341340
-H 'Authorization: Bearer &lt;dstack token&gt;' \
342341
-H 'Content-Type: application/json' \
@@ -359,8 +358,7 @@ $ curl http://127.0.0.1:3000/proxy/models/main/chat/completions \
359358

360359
</div>
361360

362-
When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the OpenAI-compatible endpoint
363-
is available at `https://gateway.<gateway domain>/`.
361+
When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the service endpoint will be available at `https://serve-distill.<gateway domain>/`.
364362

365363
## Source code
366364

examples/inference/vllm/README.md

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -78,13 +78,12 @@ Provisioning...
7878
```
7979
</div>
8080

81-
If no gateway is created, the model will be available via the OpenAI-compatible endpoint
82-
at `<dstack server URL>/proxy/models/<project name>/`.
81+
If no gateway is created, the service endpoint will be available at `<dstack server URL>/proxy/services/<project name>/<run name>/`.
8382

8483
<div class="termy">
8584

8685
```shell
87-
$ curl http://127.0.0.1:3000/proxy/models/main/chat/completions \
86+
$ curl http://127.0.0.1:3000/proxy/services/main/llama31/v1/chat/completions \
8887
-X POST \
8988
-H 'Authorization: Bearer &lt;dstack token&gt;' \
9089
-H 'Content-Type: application/json' \
@@ -106,8 +105,7 @@ $ curl http://127.0.0.1:3000/proxy/models/main/chat/completions \
106105

107106
</div>
108107

109-
When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the OpenAI-compatible endpoint
110-
is available at `https://gateway.<gateway domain>/`.
108+
When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the service endpoint will be available at `https://llama31.<gateway domain>/`.
111109

112110
## Source code
113111

examples/llms/deepseek/README.md

Lines changed: 6 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -179,7 +179,7 @@ Both SGLang and vLLM also support `Deepseek-V2-Lite`.
179179

180180
```yaml
181181
type: service
182-
name: deepseek-r1-nvidia
182+
name: deepseek-r1
183183

184184
image: lmsysorg/sglang:latest
185185
env:
@@ -203,7 +203,7 @@ Both SGLang and vLLM also support `Deepseek-V2-Lite`.
203203

204204
```yaml
205205
type: service
206-
name: deepseek-r1-nvidia
206+
name: deepseek-r1
207207

208208
image: vllm/vllm-openai:latest
209209
env:
@@ -255,20 +255,19 @@ $ dstack apply -f examples/llms/deepseek/sglang/amd/.dstack.yml
255255
# BACKEND REGION RESOURCES SPOT PRICE
256256
1 runpod EU-RO-1 24xCPU, 283GB, 1xMI300X (192GB) no $2.49
257257

258-
Submit the run deepseek-r1-amd? [y/n]: y
258+
Submit the run deepseek-r1? [y/n]: y
259259

260260
Provisioning...
261261
---> 100%
262262
```
263263
</div>
264264

265-
Once the service is up, the model will be available via the OpenAI-compatible endpoint
266-
at `<dstack server URL>/proxy/models/<project name>/`.
265+
If no gateway is created, the service endpoint will be available at `<dstack server URL>/proxy/services/<project name>/<run name>/`.
267266

268267
<div class="termy">
269268

270269
```shell
271-
curl http://127.0.0.1:3000/proxy/models/main/chat/completions \
270+
curl http://127.0.0.1:3000/proxy/services/main/deepseek-r1/v1/chat/completions \
272271
-X POST \
273272
-H 'Authorization: Bearer &lt;dstack token&gt;' \
274273
-H 'Content-Type: application/json' \
@@ -290,8 +289,7 @@ curl http://127.0.0.1:3000/proxy/models/main/chat/completions \
290289
```
291290
</div>
292291

293-
When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the OpenAI-compatible endpoint
294-
is available at `https://gateway.<gateway domain>/`.
292+
When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the service endpoint will be available at `https://deepseek-r1.<gateway domain>/`.
295293

296294
## Fine-tuning
297295

examples/llms/llama31/README.md

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -179,13 +179,12 @@ Provisioning...
179179

180180
</div>
181181

182-
Once the service is up, the model will be available via the OpenAI-compatible endpoint
183-
at `<dstack server URL>/proxy/models/<project name>/`.
182+
If no gateway is created, the service endpoint will be available at `<dstack server URL>/proxy/services/<project name>/<run name>/`.
184183

185184
<div class="termy">
186185

187186
```shell
188-
$ curl http://127.0.0.1:3000/proxy/models/main/chat/completions \
187+
$ curl http://127.0.0.1:3000/proxy/services/main/llama31/v1/chat/completions \
189188
-X POST \
190189
-H 'Authorization: Bearer &lt;dstack token&gt;' \
191190
-H 'Content-Type: application/json' \
@@ -207,8 +206,7 @@ $ curl http://127.0.0.1:3000/proxy/models/main/chat/completions \
207206

208207
</div>
209208

210-
When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the OpenAI-compatible endpoint
211-
is available at `https://gateway.<gateway domain>/`.
209+
When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the service endpoint will be available at `https://llama31.<gateway domain>/`.
212210

213211
[//]: # (TODO: How to prompting and tool calling)
214212

0 commit comments

Comments
 (0)