diff --git a/.env.example b/.env.example index 0e32aa4..d5c3312 100644 --- a/.env.example +++ b/.env.example @@ -73,13 +73,20 @@ SIGNALING_PORT=3000 # SFU Configuration # ============================================================================= SFU_NODE_ID=sfu-1 +# How signaling reaches the SFU control-plane (host:port). The SFU runs with +# host networking, so in production set this to the SFU box's reachable address +# (e.g. sfu1.:4000), not the bridge service name. SFU_NODES=localhost:4000 +# The SFU host's PUBLIC IPv4 — wired into the SFU as MEDIASOUP_ANNOUNCED_IP. +# Clients connect here for WebRTC media; must be a real public IP in production. SFU_ANNOUNCED_IP=127.0.0.1 SFU_PORT=4000 -# Mediasoup RTC Ports -RTC_MIN_PORT=10000 -RTC_MAX_PORT=10100 +# Mediasoup RTC media port range (UDP primary + TCP fallback). The SFU service +# in docker-compose pins MEDIASOUP_RTC_MIN_PORT/MAX_PORT to this window — open it +# in the firewall on the SFU host. +RTC_MIN_PORT=40000 +RTC_MAX_PORT=49999 # ============================================================================= # TURN Server Configuration diff --git a/DEPLOYMENT_GUIDE.md b/DEPLOYMENT_GUIDE.md index c21208a..1bfbd83 100644 --- a/DEPLOYMENT_GUIDE.md +++ b/DEPLOYMENT_GUIDE.md @@ -524,12 +524,12 @@ SIGNALING_PORT=3000 # ============================================================================= SFU_NODE_ID=sfu-1 SFU_NODES=sfu-1.tradingroom.io:4000 -SFU_ANNOUNCED_IP=xx.xx.xx.xx # Your SFU server IP +SFU_ANNOUNCED_IP=xx.xx.xx.xx # Your SFU server's PUBLIC IPv4 (clients connect here for media) SFU_PORT=4000 -# Mediasoup RTC Ports -RTC_MIN_PORT=10000 -RTC_MAX_PORT=10100 +# Mediasoup RTC media port range (UDP primary + TCP fallback) — open on the SFU host. +RTC_MIN_PORT=40000 +RTC_MAX_PORT=49999 # ============================================================================= # TURN Server Configuration @@ -1074,9 +1074,9 @@ We'll use Let's Encrypt with Certbot for free SSL certificates. PORT=4000 NODE_ID=sfu-1 REDIS_URL=redis://yy.yy.yy.yy:6379 - RTC_MIN_PORT=10000 - RTC_MAX_PORT=10100 - ANNOUNCED_IP=$(curl -s ifconfig.me) + MEDIASOUP_RTC_MIN_PORT=40000 + MEDIASOUP_RTC_MAX_PORT=49999 + MEDIASOUP_ANNOUNCED_IP=$(curl -s ifconfig.me) MEDIASOUP_LOG_LEVEL=warn EOF ``` diff --git a/SUBSCRIPTIONS_AND_TIERS.md b/SUBSCRIPTIONS_AND_TIERS.md new file mode 100644 index 0000000..994295d --- /dev/null +++ b/SUBSCRIPTIONS_AND_TIERS.md @@ -0,0 +1,166 @@ +# Subscriptions, Services & Tiers — End-to-End Setup + +Everything you need to sign up for (external services) and configure (the in-app +plan catalog) to run **Trading Room** end-to-end: Rust API (`backend-rs`), +SvelteKit frontend (`frontend-svelte`), Node signaling + mediasoup SFU, +recorder, Postgres, Redis, and TURN. + +There are **two kinds of "subscriptions"** here: + +1. **External service subscriptions** — third-party accounts the platform depends on (§1). +2. **In-app subscription tiers** — the plans *your customers* buy, defined in Stripe + the `plans` table (§2). + +--- + +## 1. External service subscriptions (accounts you must create) + +| # | Service | Required? | What it powers | Recommended tier | Approx. cost | +|---|---------|-----------|----------------|------------------|--------------| +| 1 | **Neon** (Postgres) | ✅ Required | Primary database (all app data) | Free to start → **Launch** in prod | $0 → **$19/mo** | +| 2 | **Stripe** | ✅ Required (billing) | Checkout, Billing Portal, subscription webhooks | Standard (no monthly fee) | **2.9% + $0.30** per txn | +| 3 | **Cloudflare R2** | ✅ Required (file uploads) | Room file storage via presigned URLs | Free tier → usage | $0 (10 GB free) → **~$5–10/mo** | +| 4 | **TURN/STUN server** | ✅ Required (WebRTC) | NAT traversal so calls connect across networks | Self-host **coturn** (incl. in compose) **or** managed | $0 (self-host) or usage | +| 5 | **Compute / hosting** | ✅ Required | Runs the API, frontend, signaling, **SFU**, TURN | **Hetzner** (or any VPS/cloud) — needs public IP + open UDP | **~$52/mo** (2 servers) | +| 6 | **Redis** | ✅ Required | Signaling/SFU coordination & cache | Self-host (incl. in compose) **or** Upstash/Redis Cloud | $0 (self-host) → ~$10/mo | +| 7 | **Domain + TLS** | ✅ Required | Public hostnames + HTTPS/WSS | Any registrar; TLS via Let's Encrypt/Cloudflare (free) | **$10–15/yr** | +| 8 | **Cloudflare** (DNS/CDN/WAF) | ⭐ Recommended | DNS, CDN, WAF, TLS | Free plan is sufficient | $0 | +| 9 | **Transactional email** | ⭐ Recommended | Email verification / password reset / receipts | Resend / Postmark / SES / Mailgun | $0–$15/mo | +| 10 | **GitHub** (Actions + GHCR) | ⭐ Recommended | CI (tests/clippy/svelte-check) + image registry | Free for the included usage | $0 | +| 11 | **Observability** (OTLP) | ⚪ Optional | Distributed tracing | Self-host **Jaeger** (incl. in compose) or Grafana Cloud/Honeycomb | $0 → usage | +| 12 | **Codecov** | ⚪ Optional | CI coverage upload (`CODECOV_TOKEN`) | Free for the included usage | $0 | +| 13 | **Slack** | ⚪ Optional | Deploy notifications (incoming webhook) | Free | $0 | + +> **Minimum to run end-to-end:** Neon + Stripe + Cloudflare R2 + a TURN server + +> one VPS (with Redis) + a domain. Estimated base: **~$60–85/month** plus Stripe +> fees. +> +> **Realtime media plane** (self-hosted SFU + TURN on Hetzner — server specs, +> ports, low-latency tuning, the `MEDIASOUP_ANNOUNCED_IP`/port fixes): see +> **[`docs/MEDIA_INFRASTRUCTURE.md`](docs/MEDIA_INFRASTRUCTURE.md)**. + +### 1.1 What each service maps to (env vars) + +| Service | Env vars (where) | +|---------|------------------| +| Neon Postgres | `DATABASE_URL` (api-rs, signaling) | +| Redis | `REDIS_URL` (signaling, sfu) | +| Stripe | `STRIPE_SECRET`, `STRIPE_WEBHOOK_SECRET`, `STRIPE_KEY` (publishable, frontend); price IDs → `plans` table | +| Cloudflare R2 | `R2_ENDPOINT`, `R2_BUCKET`, `R2_ACCESS_KEY_ID`, `R2_SECRET_ACCESS_KEY`, `R2_REGION` (api-rs) | +| TURN | `TURN_SERVER_URL`, `TURN_SERVER_USERNAME`, `TURN_SERVER_CREDENTIAL` | +| Signaling control-plane | `SIGNALING_URL`, `SIGNALING_SECRET`, `JWT_SECRET`, `SIGNALING_WS_URL`/`PUBLIC_SIGNALING_URL` (frontend) | +| SFU | `SFU_NODES`, `SFU_ANNOUNCED_IP`, `SFU_NODE_ID` | +| Frontend (BFF) | `API_URL` (server-side → api-rs), `PUBLIC_SIGNALING_URL` | +| App base URL | `APP_URL`, `CORS_ORIGINS` | +| Observability | `OTEL_EXPORTER_OTLP_ENDPOINT` (optional, e.g. `http://jaeger:4318`) | + +--- + +## 2. In-app subscription tiers (your product's plans) + +These are the plans customers subscribe to. Each must exist in **two places**, +kept in sync: + +1. **Stripe** — one **Product** per paid tier, each with a **monthly** and a + **yearly** recurring **Price**. +2. **Postgres `plans` table** — one row per tier, holding the limits + the + Stripe price IDs (`stripe_price_id_monthly`, `stripe_price_id_yearly`). + +### 2.1 Recommended tier catalog + +`-1` means **unlimited**. Annual price ≈ 10× monthly (2 months free). + +| Tier | Monthly | Yearly | Workspaces | Rooms | Hosts/room | Viewers/room | Storage | Recording | Analytics | SSO | API | Audit logs | SLA | +|------|--------:|-------:|-----------:|------:|-----------:|-------------:|--------:|:---------:|-----------|:---:|:---:|:----------:|----:| +| **Free** | $0 | $0 | 1 | 1 | 1 | 10 | 1 GB | ✗ | basic | ✗ | ✗ | ✗ | — | +| **Starter** | $49 | $490 | 1 | 3 | 1 | 50 | 5 GB | ✗ | basic | ✗ | ✗ | ✗ | 99.5% | +| **Professional** | $149 | $1,490 | 3 | 10 | 3 | 200 | 25 GB | ✓ | advanced | ✗ | ✓ | ✗ | 99.9% | +| **Business** | $449 | $4,490 | 10 | 50 | 10 | 1,000 | 100 GB | ✓ | full | ✓ | ✓ | ✓ | 99.95% | +| **Enterprise** | Custom | Custom | ∞ | ∞ | ∞ | ∞ | ∞ | ✓ | full | ✓ | ✓ | ✓ | 99.99% | + +> The **Free** tier is the default for every newly-registered organization (no +> Stripe needed). **Enterprise** is sold via sales (no public Stripe price — set +> `is_active = false` or handle manually). + +### 2.2 Stripe setup (per paid tier) + +For **Starter, Professional, Business** (do it in **Test mode** first, then repeat in **Live**): + +1. **Dashboard → Products → Add product** — name it (e.g. `Professional Plan`). +2. Add a **recurring monthly** price and a **recurring yearly** price. +3. Copy both **Price IDs** (`price_…`) into the matching `plans` row. +4. **Dashboard → Developers → Webhooks → Add endpoint:** + - URL: `https://api./v1/webhooks/stripe` + - Events: `checkout.session.completed`, `customer.subscription.updated`, `customer.subscription.deleted` + - Copy the **Signing secret** (`whsec_…`) → `STRIPE_WEBHOOK_SECRET`. +5. Put the **publishable** key in the frontend (`STRIPE_KEY` / `pk_…`) and the + **secret** key in the API (`STRIPE_SECRET` / `sk_…`). + +> The API auto-creates a Stripe **Customer** per organization on first checkout, +> sends `metadata.organization_id` / `metadata.plan_id` through Checkout, and the +> webhook upserts the `subscriptions` row keyed on `stripe_subscription_id`. + +### 2.3 Seed the `plans` table + +Prices are stored as **integer cents**. Replace the `price_…` placeholders with +the IDs from §2.2 (leave Free as `NULL`). + +```sql +-- backend-rs schema: crates/api/migrations/0005_billing.sql +INSERT INTO plans + (name, display_name, price_monthly_cents, price_yearly_cents, + stripe_price_id_monthly, stripe_price_id_yearly, + max_workspaces, max_rooms, max_hosts_per_room, max_viewers_per_room, max_storage_gb, + features, is_active) +VALUES + ('free', 'Free', 0, 0, NULL, NULL, + 1, 1, 1, 10, 1, + '{"recording": false, "analytics": "basic", "sso": false, "api_access": false, "audit_logs": false}', true), + + ('starter', 'Starter', 4900, 49000, + 'price_STARTER_MONTHLY', 'price_STARTER_YEARLY', + 1, 3, 1, 50, 5, + '{"recording": false, "analytics": "basic", "sso": false, "api_access": false, "audit_logs": false, "sla": "99.5"}', true), + + ('professional', 'Professional', 14900, 149000, + 'price_PRO_MONTHLY', 'price_PRO_YEARLY', + 3, 10, 3, 200, 25, + '{"recording": true, "analytics": "advanced", "sso": false, "api_access": true, "audit_logs": false, "sla": "99.9"}', true), + + ('business', 'Business', 44900, 449000, + 'price_BUSINESS_MONTHLY', 'price_BUSINESS_YEARLY', + 10, 50, 10, 1000, 100, + '{"recording": true, "analytics": "full", "custom_branding": true, "sso": true, "api_access": true, "audit_logs": true, "sla": "99.95"}', true), + + ('enterprise', 'Enterprise', 0, 0, NULL, NULL, + -1, -1, -1, -1, -1, + '{"recording": true, "analytics": "full", "custom_branding": true, "sso": true, "api_access": true, "audit_logs": true, "sla": "99.99", "dedicated_support": true}', false); +``` + +--- + +## 3. End-to-end provisioning checklist + +- [ ] **Neon** project created → `DATABASE_URL` set; run `sqlx migrate run --source crates/api/migrations` (or `make migrate`). +- [ ] **Redis** reachable → `REDIS_URL` set (compose `redis` service or managed). +- [ ] **Cloudflare R2** bucket + API token → `R2_*` set. +- [ ] **Stripe** products/prices created → IDs seeded into `plans`; webhook endpoint added → `STRIPE_SECRET` + `STRIPE_WEBHOOK_SECRET` + `STRIPE_KEY` set. +- [ ] **TURN** server running (coturn or managed) → `TURN_SERVER_*` set; UDP ports open. +- [ ] **SFU** host has a public IP → `SFU_ANNOUNCED_IP` (→ `MEDIASOUP_ANNOUNCED_IP`) + RTC UDP/TCP port range (40000–49999) open; SFU runs with host networking (see `docs/MEDIA_INFRASTRUCTURE.md`). +- [ ] **Domain + TLS** → `app.` (frontend), `api.` (Rust API), `signaling.` (WSS); `APP_URL`, `CORS_ORIGINS`, `PUBLIC_SIGNALING_URL` set. +- [ ] **Secrets** generated → strong `JWT_SECRET` and `SIGNALING_SECRET` (shared between the API and `signaling-rs`/signaling). +- [ ] *(Recommended)* Transactional email provider configured (only needed once email flows are enabled). +- [ ] *(Optional)* `OTEL_EXPORTER_OTLP_ENDPOINT` → Jaeger/Grafana for traces. + +### Dev shortcut + +Everything except Neon/Stripe/R2/TURN can run locally via +`make docker-up` (Postgres can also be local). Use Stripe **test** keys, the +Stripe CLI (`stripe listen --forward-to localhost:8080/v1/webhooks/stripe`) to +exercise webhooks, and `coturn` from the compose file for TURN. + +--- + +_Tier limits/features above mirror `docs/DATABASE_SCHEMA.md`; the `plans` columns +match `backend-rs/crates/api/migrations/0005_billing.sql`. Adjust prices/limits to +your market — the app reads them from the DB, so changes need no code edits (only +matching Stripe prices)._ diff --git a/docs/MEDIA_INFRASTRUCTURE.md b/docs/MEDIA_INFRASTRUCTURE.md new file mode 100644 index 0000000..1d73ae5 --- /dev/null +++ b/docs/MEDIA_INFRASTRUCTURE.md @@ -0,0 +1,247 @@ +# Media Infrastructure — Self-Hosted Realtime (Path A: Hetzner + coturn) + +How to host the **live screenshare + voice** plane yourself for **true realtime, +no buffering** — sub‑~200 ms glass‑to‑glass. This is the LiveKit‑equivalent +layer; the repo already ships the SFU (`sfu/`, mediasoup), the signaling control +plane (`signaling-rs`/`signaling`), TURN config (`infrastructure/docker/coturn`), +and the egress recorder (`recorder/`). You are **not** paying a per‑minute WebRTC +SaaS — only flat infrastructure. + +> **Why this is realtime:** media flows **browser → SFU → browsers over WebRTC +> (SRTP/UDP)** — the same path LiveKit/Zoom use. There is **no HLS/DASH/RTMP** on +> the live path (those add 2–30 s of buffering). HLS/recording is **egress only** +> and never sits between participants. + +--- + +## 1. Providers to sign up for (Path A) + +### Realtime / latency-critical (must self-host on low-latency infra) +| Service | Purpose | Sign up | Recommended | +|---|---|---|---| +| **Hetzner Cloud** — SFU host | mediasoup SFU (the media engine) | hetzner.com/cloud | **CCX** dedicated‑CPU (see §3) | +| **Hetzner Cloud** — TURN host | coturn NAT‑traversal fallback | same | CX22, or co‑locate on the SFU box | +| **Hetzner Cloud** — App host | Rust API + SvelteKit + signaling + Redis | same | CX32 / CPX31 | +| **Cloudflare** (free) | DNS, TLS origin certs, WAF for the web tier (**not** the media path) | cloudflare.com | Free plan | +| **Domain registrar** | hostnames | any | ~$10–15/yr | + +> Pick the Hetzner **location closest to your audience** — latency is dominated by +> round‑trip to the SFU: `nbg1`/`fsn1`/`hel1` (EU), `ash` (US‑East), `hil` +> (US‑West), `sin` (Singapore). One region per audience; go multi‑region only for +> a global audience (§9). + +### Non‑realtime (off the media path — can be anywhere) +These never touch live media, so latency is irrelevant. See +`SUBSCRIPTIONS_AND_TIERS.md` for full setup. +- **Neon** (Postgres), **Stripe** (billing), **Cloudflare R2** (file storage). +- **Optional managed TURN** instead of coturn: **Cloudflare Calls TURN** (cheapest/free‑ish) or **Metered.ca**. + +--- + +## 2. Latency budget (target) + +| Hop | Target | +|---|---| +| Capture + encode (browser) | 5–20 ms | +| Uplink RTT to SFU | 10–40 ms (region‑local) | +| SFU forward (mediasoup C++) | **0.1–1 ms** | +| Downlink to viewers | 10–40 ms | +| Jitter buffer + decode + paint | 30–80 ms | +| **Total glass‑to‑glass** | **~80–180 ms** (LAN/region), <250 ms cross‑region | + +Everything below exists to protect this budget. + +--- + +## 3. Server specs + +| Role | Hetzner type | Specs | Notes | +|---|---|---|---| +| **SFU** | **CCX23/CCX33** (dedicated vCPU) | 4–8 dedicated vCPU, 16–32 GB | **Dedicated CPU is non‑negotiable** — shared vCPU causes jitter spikes during contention. ~200–500 video consumers/core. | +| **TURN** | CX22 (or on the SFU box) | 2 vCPU, 4 GB | Mostly bandwidth; only ~10–20 % of sessions relay. | +| **App** | CX32 / CPX31 | 4 vCPU, 8 GB | Rust API + SvelteKit + signaling + Redis (not latency‑critical). | + +mediasoup runs **one worker per vCPU**; the container needs `--cap-add=SYS_NICE` +(already set in compose) so workers can raise scheduling priority. + +--- + +## 4. Network, ports & firewall + +Open on the **SFU host**: +| Port(s) | Proto | Purpose | +|---|---|---| +| `40000–49999` | **UDP** (primary) + TCP (fallback) | mediasoup RTC media (ICE) — see §6 for the recommended widened range | +| `4000` | TCP | SFU control-plane HTTP (internal/private only) | + +Open on the **TURN host** (from `coturn/turnserver.conf`): +| Port(s) | Proto | Purpose | +|---|---|---| +| `3478` | UDP + TCP | STUN/TURN | +| `5349` | UDP + TCP | TURN over TLS (`turns:`) for restrictive networks | +| `49152–49200` | UDP | TURN relay range (widen for scale, see §7) | + +Open on the **App host**: `80`, `443` (HTTP/WSS). + +> **Critical for realtime:** the SFU must advertise its **public IP** in ICE +> candidates via `MEDIASOUP_ANNOUNCED_IP`. If it advertises `0.0.0.0`/a private +> IP, remote peers can't reach it directly and **fall back to TURN relay — adding +> a hop and latency**, or fail outright. + +--- + +## 5. ✅ Config corrections (applied) + +Two mismatches in the original compose/SFU config would have **broken direct +connectivity and forced TURN relay** (extra latency) or failed calls. **Both are +now fixed** in `infrastructure/docker/docker-compose.yml`: + +1. **Env‑name mismatch (fixed).** `sfu/src/config/*.ts` reads + `MEDIASOUP_ANNOUNCED_IP`, `MEDIASOUP_RTC_MIN_PORT`, `MEDIASOUP_RTC_MAX_PORT`, + but the compose previously set `ANNOUNCED_IP`, `RTC_MIN_PORT`, `RTC_MAX_PORT` + — so the SFU silently ignored them and fell back to defaults (no announced IP, + ports 10000–59999). The compose vars are now renamed to the `MEDIASOUP_*` + names. `sfu/.env.example` and `.env.example` were corrected too. +2. **Port range (fixed).** The range is now a single ~10k‑port window + (`40000–49999`) for both the worker and the firewall. + +**Applied fix (best latency): the SFU runs with host networking** on its +dedicated box — no Docker UDP NAT, no per‑port publish explosion, lowest latency: + +```yaml +# infrastructure/docker/docker-compose.yml — sfu service (as shipped) +sfu: + network_mode: host # bind host ports directly (no Docker NAT) + environment: + - MEDIASOUP_ANNOUNCED_IP=${SFU_ANNOUNCED_IP} # the host's PUBLIC IPv4 + - MEDIASOUP_RTC_MIN_PORT=40000 + - MEDIASOUP_RTC_MAX_PORT=49999 + # host networking → reach Redis via its host-published port: + - REDIS_URL=redis://:${REDIS_PASSWORD}@localhost:6379 +``` + +> **Note on host networking:** the SFU no longer joins the Docker bridge, so +> `SFU_NODES` (used by signaling to reach the SFU control-plane) must point at the +> SFU host's reachable address (e.g. `sfu1.:4000`), not the bridge service +> name. On a single dev box, the host-published `4000` works. If you ever revert +> to bridge networking, publish the **exact same** UDP+TCP range you set in +> `MEDIASOUP_RTC_MIN/MAX_PORT`. + +--- + +## 6. Realtime media tuning + +**Transport** +- WebRTC SRTP over **UDP first**; TCP and TURN are fallbacks only. +- Keep TURN as **UDP relay** (`turns:`/TCP only for locked‑down networks) — TCP + relay adds head‑of‑line blocking and latency. +- A **~10k‑port UDP range** (40000–49999) supports far more concurrent ICE + connections than the current 101‑port window — widen it (§5). + +**Codecs** (already configured in `sfu/src/config/mediasoup.ts`) +- **Voice:** Opus with **inband FEC** (packet‑loss resilience), **DTX** + (silence suppression → less bandwidth/jitter), **20 ms ptime**. Wire + `useinbandfec=1; usedtx=1` and `maxaveragebitrate≈32–64 kbps` on the producer. +- **Camera video:** VP9 **SVC** (preferred, 30–50 % savings) or VP8 **simulcast**; + **H.264** included for Safari/iOS; **AV1** for capable clients. +- **Screenshare:** publish with `track.contentHint = 'detail'` (text/slides) or + `'motion'` (video); use a **higher bitrate + lower frame rate** (e.g. + 1.5–4 Mbps, 5–15 fps for slides, 30 fps for motion) and **don't downscale** + shared resolution. The live page already passes screenshare simulcast layers. + +**Per‑viewer adaptation** +- Simulcast/SVC lets the SFU forward the **right layer per viewer** so one slow + viewer never stalls others. Wire mediasoup `AudioLevelObserver` for active‑ + speaker (already recommended in the tech audit) and set consumer + `preferredLayers` from the UI tile size. + +**Low playout delay** +- Minimize the receiver jitter buffer for interactivity (set + `playoutDelayHint`/`jitterBufferTarget` low on consumers); the jitter buffer is + the single biggest tunable in the budget (§2). + +**SFU host OS/NIC tuning** (`/etc/sysctl.d/99-webrtc.conf`) +``` +net.core.rmem_max = 16777216 +net.core.wmem_max = 16777216 +net.core.netdev_max_backlog = 250000 +net.ipv4.udp_mem = 262144 524288 1048576 +net.core.default_qdisc = fq +net.ipv4.tcp_congestion_control = bbr # for the WSS/API tier +fs.file-max = 1000000 +``` +Plus `ulimit -n 1000000` (compose already sets `nofile 65536` — raise for scale), +and pin mediasoup workers to cores on the dedicated box. + +**Bandwidth math (capacity planning)** +- SFU **egress per room** ≈ Σ(consumer bitrates). Example: 1 host @ 2.5 Mbps to + 100 viewers ≈ **250 Mbps** down. A CCX box on a 1–10 Gbps NIC handles this + comfortably; Hetzner egress (~€1/TB, ~20 TB incl.) makes it cheap. Budget + ≈ host_bitrate × viewers; add ~30 % headroom. + +--- + +## 7. TURN (coturn) specifics + +From `infrastructure/docker/coturn/turnserver.conf`: +- **`use-auth-secret` + `static-auth-secret`** → time‑limited TURN REST + credentials (the app mints short‑lived `username:credential` pairs; never ship + static creds to browsers). Set `TURN_AUTH_SECRET`. +- **`external-ip=${TURN_EXTERNAL_IP}`** → the TURN host's public IP (same idea as + `MEDIASOUP_ANNOUNCED_IP`). +- **`min-port`/`max-port`** (49152–49200) → widen for concurrency. +- **TLS** (`cert`/`pkey`, port 5349) → provision certs (Let's Encrypt) so + `turns:` works behind corporate firewalls. +- Hand clients the ICE server list (`TURN_SERVER_URL/USERNAME/CREDENTIAL`) at + join time alongside the signaling token. + +> Don't want to run coturn? Point those env vars at **Cloudflare Calls TURN** or +> **Metered.ca** instead — TURN is the only piece comfortably outsourced without +> hurting the median (P2P‑to‑SFU) path. + +--- + +## 8. DNS records + +| Record | Points to | Purpose | +|---|---|---| +| `app.` | App host | SvelteKit frontend (443) | +| `api.` | App host | Rust API (443) | +| `signaling.` | App/SFU host | signaling **WSS** | +| `turn.` | TURN host | STUN/TURN (matches `realm`/`server-name`) | +| `sfu1.` | SFU host | (optional) per‑node addressing for multi‑node | + +> Put the **web/API/WSS** tiers behind Cloudflare (TLS/WAF). Leave **media UDP +> (SFU/TURN) NOT proxied** (DNS‑only / grey‑cloud) — Cloudflare's proxy doesn't +> carry WebRTC UDP, and proxying would add latency. + +--- + +## 9. Scaling (keep it realtime as you grow) + +- **Vertical first:** bigger CCX = more consumers/core. Cheapest path to thousands + of viewers per room. +- **Horizontal:** add SFU nodes; mediasoup **PipeTransport** bridges routers + across nodes (the audit's Phase‑1 plan); `signaling-rs` allocates a room to a + node and load‑balances. Set `SFU_NODES` / per‑node `SFU_NODE_ID`. +- **Multi‑region:** one SFU cluster per region; route each participant to the + nearest node; pipe between regions only when a room spans them. + +--- + +## 10. Pre‑flight checklist + +- [ ] CCX (dedicated CPU) SFU host provisioned in the audience's region. +- [ ] `MEDIASOUP_ANNOUNCED_IP` = SFU public IPv4; **env names corrected** (§5). +- [ ] SFU UDP+TCP RTC range opened **and** matches `MEDIASOUP_RTC_MIN/MAX_PORT` (host networking recommended). +- [ ] coturn up with `external-ip`, `static-auth-secret`, TLS certs; 3478/5349 + relay range open. +- [ ] sysctl UDP buffers + BBR applied; `nofile` raised; workers = vCPUs. +- [ ] DNS: app/api/signaling/turn records; media UDP **not** Cloudflare‑proxied. +- [ ] TLS on app/api/signaling (Let's Encrypt) and TURN. +- [ ] Validate: open `chrome://webrtc-internals`, confirm **`candidate-pair` is `host`/`srflx` (not `relay`)**, RTT < ~50 ms region‑local, and no growing jitter. + +--- + +_See `SUBSCRIPTIONS_AND_TIERS.md` for the non‑realtime services (Neon, Stripe, R2) +and the in‑app plan catalog. The realtime path described here is self‑hosted — +no per‑minute media vendor required._ diff --git a/infrastructure/docker/docker-compose.yml b/infrastructure/docker/docker-compose.yml index 98bbe10..b2ed18c 100644 --- a/infrastructure/docker/docker-compose.yml +++ b/infrastructure/docker/docker-compose.yml @@ -83,24 +83,31 @@ services: dockerfile: ../infrastructure/docker/Dockerfile.sfu container_name: tradingroom-sfu restart: unless-stopped - ports: - - "4000:4000" - - "10000-10100:10000-10100/udp" - - "10000-10100:10000-10100/tcp" + # Host networking: the SFU binds its RTC UDP/TCP ports directly on the host + # for the lowest-latency media path (no Docker NAT, no per-port userland + # proxy) and so the WebRTC announced-IP candidates are reachable. The SFU is + # meant to run on its own dedicated box in production — see + # docs/MEDIA_INFRASTRUCTURE.md. Control-plane HTTP is on host port 4000 and + # the RTC media range below (40000-49999 UDP+TCP) must be open in the firewall. + network_mode: host stop_grace_period: 30s environment: - NODE_ENV=${NODE_ENV:-development} - PORT=4000 - NODE_ID=${SFU_NODE_ID:-sfu-1} - - REDIS_URL=redis://:${REDIS_PASSWORD:-tradingroom_redis}@redis:6379 - - RTC_MIN_PORT=10000 - - RTC_MAX_PORT=10100 - - ANNOUNCED_IP=${SFU_ANNOUNCED_IP} + # Under host networking the bridge `redis` DNS name is unavailable; reach + # Redis via its host-published port. In production point this at your + # Redis host/endpoint. + - REDIS_URL=redis://:${REDIS_PASSWORD:-tradingroom_redis}@localhost:6379 + # These names MUST match what sfu/src/config reads (the MEDIASOUP_* prefix); + # the old RTC_MIN_PORT/RTC_MAX_PORT/ANNOUNCED_IP were silently ignored. + - MEDIASOUP_RTC_MIN_PORT=40000 + - MEDIASOUP_RTC_MAX_PORT=49999 + # The host's PUBLIC IPv4 — clients connect here for media. Must be set. + - MEDIASOUP_ANNOUNCED_IP=${SFU_ANNOUNCED_IP} - MEDIASOUP_LOG_LEVEL=warn depends_on: - redis - networks: - - tradingroom-network # Required for Mediasoup cap_add: - SYS_NICE diff --git a/sfu/.env.example b/sfu/.env.example index cce0b74..4aeaef5 100644 --- a/sfu/.env.example +++ b/sfu/.env.example @@ -3,10 +3,16 @@ PORT=4000 HOST=0.0.0.0 REDIS_URL=redis://127.0.0.1:6379 -ANNOUNCED_IP=127.0.0.1 -RTC_MIN_PORT=10000 -RTC_MAX_PORT=10100 +# The host's PUBLIC IPv4 — clients connect here for WebRTC media (ICE candidates). +MEDIASOUP_ANNOUNCED_IP=127.0.0.1 -NUM_WORKERS=auto -LOG_LEVEL=debug +# RTC media port range (UDP primary + TCP fallback). Must be open in the firewall +# and, under bridge networking, published 1:1. A ~10k window supports far more +# concurrent ICE connections than a narrow range. +MEDIASOUP_RTC_MIN_PORT=40000 +MEDIASOUP_RTC_MAX_PORT=49999 + +# Leave blank to auto-detect (one worker per CPU core). +MEDIASOUP_NUM_WORKERS= +MEDIASOUP_LOG_LEVEL=debug