You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/docs/concepts/gateways.md
+45-4Lines changed: 45 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,10 +1,9 @@
1
1
# Gateways
2
2
3
-
Gateways manage the ingress traffic of running [services](services.md),
4
-
provide an HTTPS endpoint mapped to your domain, handle auto-scaling and rate limits.
3
+
Gateways manage ingress traffic for running [services](services.md), handle auto-scaling and rate limits, enable HTTPS, and allow you to configure a custom domain. They also support custom routers, such as the [SGLang Model Gateway :material-arrow-top-right-thin:{ .external }](https://docs.sglang.ai/advanced_features/router.html#){:target="_blank"}.
5
4
6
-
> If you're using [dstack Sky :material-arrow-top-right-thin:{ .external }](https://sky.dstack.ai){:target="_blank"},
7
-
> the gateway is already set up for you.
5
+
<!--> If you're using [dstack Sky :material-arrow-top-right-thin:{ .external }](https://sky.dstack.ai){:target="_blank"},
6
+
> the gateway is already set up for you.-->
8
7
9
8
## Apply a configuration
10
9
@@ -57,6 +56,48 @@ You can create gateways with the `aws`, `azure`, `gcp`, or `kubernetes` backends
57
56
Gateways in `kubernetes` backend require an external load balancer. Managed Kubernetes solutions usually include a load balancer.
58
57
For self-hosted Kubernetes, you must provide a load balancer by yourself.
59
58
59
+
### Router
60
+
61
+
By default, the gateway uses its own load balancer to route traffic between replicas. However, you can delegate this responsibility to a specific router by setting the `router` property. Currently, the only supported external router is `sglang`.
62
+
63
+
#### SGLang
64
+
65
+
The `sglang` router delegates routing logic to the [SGLang Model Gateway :material-arrow-top-right-thin:{ .external }](https://docs.sglang.ai/advanced_features/router.html#){:target="_blank"}.
66
+
67
+
To enable it, set `type` field under `router` to `sglang`:
68
+
69
+
<div editor-title="gateway.dstack.yml">
70
+
71
+
```yaml
72
+
type: gateway
73
+
name: sglang-gateway
74
+
75
+
backend: aws
76
+
region: eu-west-1
77
+
78
+
domain: example.com
79
+
80
+
router:
81
+
type: sglang
82
+
policy: cache_aware
83
+
```
84
+
85
+
</div>
86
+
87
+
!!! info "Policy"
88
+
89
+
The `router` property allows you to configure the routing `policy`:
90
+
91
+
* `cache_aware` — Default policy; combines cache locality with load balancing, falling back to shortest queue.
92
+
* `power_of_two` — Samples two workers and picks the lighter one.
93
+
* `random` — Uniform random selection.
94
+
* `round_robin` — Cycles through workers in order.
95
+
96
+
97
+
> Currently, services using this type of gateway must run standard SGLang workers. See the [example](../../examples/inference/sglang/index.md).
98
+
>
99
+
> Support for prefill/decode disaggregation and auto-scaling based on inter-token latency is coming soon.
100
+
60
101
### Public IP
61
102
62
103
If you don't need/want a public IP for the gateway, you can set the `public_ip` to `false` (the default value is `true`), making the gateway private.
Copy file name to clipboardExpand all lines: docs/docs/concepts/services.md
+3-2Lines changed: 3 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -100,12 +100,13 @@ If [authorization](#authorization) is not disabled, the service endpoint require
100
100
However, you'll need a gateway in the following cases:
101
101
102
102
* To use auto-scaling or rate limits
103
+
* To enable a support custom router, e.g. such as the [SGLang Model Gateway :material-arrow-top-right-thin:{ .external }](https://docs.sglang.ai/advanced_features/router.html#){:target="_blank"}
103
104
* To enable HTTPS for the endpoint and map it to your domain
104
105
* If your service requires WebSockets
105
106
* If your service cannot work with a [path prefix](#path-prefix)
106
107
107
-
Note, if you're using [dstack Sky :material-arrow-top-right-thin:{ .external }](https://sky.dstack.ai){:target="_blank"},
108
-
a gateway is already pre-configured for you.
108
+
<!-- Note, if you're using [dstack Sky :material-arrow-top-right-thin:{ .external }](https://sky.dstack.ai){:target="_blank"},
109
+
a gateway is already pre-configured for you. -->
109
110
110
111
If a [gateway](gateways.md) is configured, the service endpoint will be accessible at
Copy file name to clipboardExpand all lines: examples/inference/sglang/README.md
+21-33Lines changed: 21 additions & 33 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,32 +2,21 @@
2
2
3
3
This example shows how to deploy DeepSeek-R1-Distill-Llama 8B and 70B using [SGLang :material-arrow-top-right-thin:{ .external }](https://github.com/sgl-project/sglang){:target="_blank"} and `dstack`.
4
4
5
-
??? info "Prerequisites"
6
-
Once `dstack` is [installed](https://dstack.ai/docs/installation), clone the repo with examples.
7
-
8
-
<div class="termy">
9
-
10
-
```shell
11
-
$ git clone https://github.com/dstackai/dstack
12
-
$ cd dstack
13
-
```
14
-
15
-
</div>
5
+
## Apply a configuration
16
6
17
-
## Deployment
18
7
Here's an example of a service that deploys DeepSeek-R1-Distill-Llama 8B and 70B using SgLang.
When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the OpenAI-compatible endpoint
122
-
is available at `https://gateway.<gateway domain>/`.
107
+
!!! info "SGLang Model Gateway"
108
+
If you'd like to use a custom routing policy, e.g. by leveraging the [SGLang Model Gateway :material-arrow-top-right-thin:{ .external }](https://docs.sglang.ai/advanced_features/router.html#){:target="_blank"}, create a gateway with `router` set to `sglang`. Check out [gateways](https://dstack.ai/docs/concepts/gateways#router) for more details.
109
+
110
+
> If a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured (e.g. to enable auto-scaling or HTTPs, rate-limits, etc), the OpenAI-compatible endpoint is available at `https://gateway.<gateway domain>/`.
123
111
124
112
## Source code
125
113
@@ -128,5 +116,5 @@ The source-code of this example can be found in
0 commit comments