Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,11 @@ tmp
/.devenv*
/.devenv-root
/.stack/state
/apps/docs/package.json.backup
/apps/web/package.json.backup
/devshell
/packages/gen/infra/package.json.backup
/packages/infra/package.json.backup
/result
apps/stackpanel-go/.stack/keys/
build
Expand Down
51 changes: 41 additions & 10 deletions .stack/config.apps.nix
Original file line number Diff line number Diff line change
Expand Up @@ -14,15 +14,50 @@ let
# message and in the studio Variables UI.
envs = {
shared = {
# These vars are declared so the per-app `Env` interface and the studio
# Variables UI know about them, but they're not yet wired to a SOPS group
# or process.env source. Mark `required = false` until a source is added,
# otherwise `loadDeployEnv` fails with a missing-required error at the
# top of every `alchemy.run.ts`.
# Secrets — sourced from `.stack/secrets/vars/shared.sops.yaml` so the
# codegen embeds real ciphertext into each app's per-env runtime payload
# (`packages/gen/env/data/<env>/<app>.sops.json`). Without `sops:` here
# the codegen rendered `"KEY": ""` and at runtime the consumers (better-
# auth, polar) either crashed (`BETTER_AUTH_SECRET` → "you are using
# the default secret" 500 on every tRPC call) or silently no-op'd.
#
# Same secrets are *also* declared in the `deploy` root env scope
# (`.stack/config.nix:envs.deploy`) so deploy-time tooling (alchemy
# bindings, `apps/api/scripts/push-secrets.sh`) can read them — that
# remains; the two scopes serve different consumers.
BETTER_AUTH_SECRET = {
required = false;
required = true;
sops = "/shared/better-auth-secret";
description = "Better Auth signing secret. Generate with `openssl rand -hex 32`.";
};
POLAR_ACCESS_TOKEN = {
required = false;
sops = "/shared/polar-access-token";
description = "Polar.sh API access token used for billing. When unset, polarClient is null and billing endpoints no-op.";
};
POLAR_WEBHOOK_SECRET = {
required = false;
sops = "/shared/polar-webhook-secret";
description = "Polar.sh webhook signing secret. When unset, the polar webhooks plugin is not mounted.";
};
POLAR_PRO_PRODUCT_ID_PRODUCTION = {
required = false;
sops = "/shared/polar-pro-product-id-production";
description = "Polar product ID for the Pro plan in production. Falls back to the sandbox product when unset.";
};
POLAR_FREE_PRODUCT_ID_PRODUCTION = {
required = false;
sops = "/shared/polar-free-product-id-production";
description = "Polar product ID for the Free plan in production. Falls back to the sandbox product when unset.";
};

# Per-environment URL/CORS config — not secrets, so no SOPS source.
# Left as `required = false` because the consuming code handles missing
# values gracefully (better-auth derives BETTER_AUTH_URL from the
# request host; CORS_ORIGIN/POLAR_SUCCESS_URL fall back to upstream
# defaults). Wire per-env literals via
# `stackpanel.envs."apps/<app>/<env>".KEY = { value = "..."; };`
# in `.stack/config.nix` if you need explicit values.
BETTER_AUTH_URL = {
required = false;
description = "Public URL the auth server is reachable at (used for OAuth redirects).";
Expand All @@ -31,10 +66,6 @@ let
required = false;
description = "Comma-separated allowed origins for the API.";
};
POLAR_ACCESS_TOKEN = {
required = false;
description = "Polar.sh API access token used for billing.";
};
POLAR_SUCCESS_URL = {
required = false;
description = "Redirect URL Polar sends customers to after a successful checkout.";
Expand Down
25 changes: 25 additions & 0 deletions apps/web/alchemy.run.ts
Original file line number Diff line number Diff line change
Expand Up @@ -38,12 +38,37 @@ const program = Effect.gen(function* () {
roleName: `${PROJECT}-${SERVICE}-owner`,
});

// The Worker decrypts its own runtime secrets at boot via
// `await loadAppEnv("web", APP_ENV, { inject: true })` in
// `apps/web/src/server.ts`, against the per-stage SOPS payload
// embedded in `@gen/env`. We forward only the two non-secret-payload
// values the loader needs:
//
// - `SOPS_AGE_KEY`: the AGE key material that unlocks every entry in
// the embedded SOPS payload. Lives in the deploy scope (CI gets it
// from `secrets.SECRETS_AGE_KEY_DEV`; dev gets it from
// `.stack/keys/local.txt` via `loadDeployEnv` above) and is read
// here from `process.env`.
// - `APP_ENV`: the resolved SOPS namespace (`prod` | `staging` |
// `dev`) so the Worker knows which `web/<env>.ts` payload to load.
// - `DATABASE_URL`: a runtime-bound resource output from the Neon
// project. Not a SOPS secret; it doesn't belong in `@gen/env`
// because the connection URI is generated per-deploy by alchemy.
//
// Adding a new app secret only requires editing
// `.stack/config.apps.nix:envs.shared` (and re-running `stackpanel
// codegen build`); it lands in the embedded payload automatically and
// becomes available to the Worker via `process.env` after the
// `loadAppEnv` call. See
// `docs/adr/0001-runtime-secrets-via-gen-env-loader.md`.
const website = yield* Cloudflare.Vite("TanstackStart", {
compatibility: {
flags: ["nodejs_compat"],
},
env: {
DATABASE_URL: db.connectionUri,
SOPS_AGE_KEY: process.env.SOPS_AGE_KEY ?? "",
APP_ENV: appEnv,
},
});
let url: Output.Output<string | undefined> = website.url;
Expand Down
57 changes: 50 additions & 7 deletions apps/web/src/server.ts
Original file line number Diff line number Diff line change
@@ -1,18 +1,61 @@
// src/server.ts
//
// Cloudflare Worker SSR entrypoint. The top-level `await loadAppEnv(...)`
// below is load-bearing for production: it decrypts the `@gen/env`
// embedded SOPS payload and injects every secret (BETTER_AUTH_SECRET,
// POLAR_*, …) into `process.env` BEFORE any request handler — and, more
// importantly, before `@stackpanel/auth` constructs its `betterAuth`
// instance — runs.
//
// Without this, `@stackpanel/auth` sees an empty `process.env.BETTER_AUTH_SECRET`
// and better-auth's `validateSecret` blows up with HTTP 500 "you are using
// the default secret" on every tRPC call (waitlist included).
//
// Top-level await is supported in Cloudflare Workers' module workers and
// is allowed at the entry module: the platform suspends Worker boot until
// the awaited value resolves, then exposes the default-exported handler.
//
// `APP_ENV` is set by `apps/web/alchemy.run.ts` at deploy time
// (`prod` | `staging` | `dev`). It picks which entry of
// `packages/gen/env/src/runtime/generated-payloads/web/{dev,staging,prod}.ts`
// to decrypt. `SOPS_AGE_KEY` is the only secret the Worker needs at
// deploy time; it's the AGE key material that unlocks every SOPS payload
// for this stage. See `docs/adr/0001-runtime-secrets-via-gen-env-loader.md`.
//
// In `vite dev` and Vitest, `process.env` is already populated from the
// devshell, `SOPS_AGE_KEY` is unset, and the loader gracefully no-ops
// when the embedded payload is missing. We swallow the error so local
// dev keeps working — production CI sets `SOPS_AGE_KEY` and any decrypt
// failure surfaces immediately.
import { loadAppEnv } from "@gen/env/runtime/edge";
import {
createStartHandler,
defaultStreamHandler,
defineHandlerCallback,
createStartHandler,
defaultStreamHandler,
defineHandlerCallback,
} from "@tanstack/react-start/server";
import { createServerEntry } from "@tanstack/react-start/server-entry";

const appEnv = process.env.APP_ENV ?? process.env.STAGE ?? "dev";

if (process.env.SOPS_AGE_KEY) {
try {
await loadAppEnv("web", appEnv, { inject: true });
} catch (err) {
// Surface as a console error but don't crash the Worker boot — the
// downstream `auth.api.getSession(...)` call will still throw a
// well-shaped better-auth error if the payload was actually needed.
// This covers the case where someone deploys with a stale
// SOPS_AGE_KEY that can't decrypt the embedded payload.
console.error("[server.ts] loadAppEnv failed:", err);
}
}

const customHandler = defineHandlerCallback((ctx) => {
// add custom logic here
return defaultStreamHandler(ctx);
return defaultStreamHandler(ctx);
});

const fetch = createStartHandler(customHandler);

export default createServerEntry({
fetch,
});
fetch,
});
186 changes: 186 additions & 0 deletions docs/adr/0001-runtime-secrets-via-gen-env-loader.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,186 @@
# 0001 — Runtime secrets are decrypted via `@gen/env`, not forwarded as Worker env vars

- **Status**: Accepted
- **Date**: 2026-05-01

## Context

The waitlist join endpoint on `stackpanel.com` was crashing in
production with HTTP 500:

```
You are using the default secret. Please change it.
```

The crash originated inside `better-auth`'s `validateSecret` and
surfaced on every tRPC call (waitlist included), because
`createTRPCContext` eagerly reads `opts.auth.api.getSession(...)`.
Investigation (see commit `8a7897c6`) found that `BETTER_AUTH_SECRET`
and the four Polar secrets (`POLAR_ACCESS_TOKEN`,
`POLAR_WEBHOOK_SECRET`, `POLAR_PRO_PRODUCT_ID_PRODUCTION`,
`POLAR_FREE_PRODUCT_ID_PRODUCTION`) were declared in
`.stack/config.apps.nix:envs.shared` with `required = false` and **no
SOPS source**. As a result, `stackpanel codegen build` rendered
`"BETTER_AUTH_SECRET": ""` into every per-stage payload at
`packages/gen/env/data/<env>/web.sops.json`. Even after we wired the
SOPS sources, the payloads remained dead code in the web Worker
because nobody was decrypting them at runtime.

Two paths were available to fix this:

1. **Forward secrets via `Cloudflare.Vite({ env: { ... } })`** — read
the values from `process.env` (populated at deploy time by
`loadDeployEnv` reading the deploy scope) and shovel each one into
the Cloudflare Worker's environment as a Worker secret. This is
what commit `21c00841` did and what we are now reverting.
2. **Decrypt the embedded SOPS payload at Worker boot** via the
existing `@gen/env/runtime` loader — give the Worker only the AGE
key material and let it decrypt the rest.

Approach (1) duplicated secret material (Cloudflare's secret store
*and* the embedded SOPS payload), required every new secret to be
added in two places (`.stack/config.apps.nix` *and*
`apps/web/alchemy.run.ts`), and bypassed the very codegen pipeline
`@gen/env` was designed to be the single source of truth for. It also
made each new secret a deploy-script edit rather than a config-only
change.

Approach (2) was already 90% built: the per-app SOPS payload is
embedded in `packages/gen/env/src/runtime/generated-payloads/web/{dev,staging,prod}.ts`,
and `nix/stackpanel/lib/codegen/templates/env/loader.ts` is an
edge-safe loader (no FileSystem/ChildProcess dependency) that reads
ciphertext + `process.env.SOPS_AGE_KEY` and produces a decrypted
payload it can inject into `process.env`. It just wasn't wired into
the web Worker's boot path.

## Decision

Workers receive only `SOPS_AGE_KEY` (and a non-secret `APP_ENV`
discriminator) at deploy time. All other application secrets are
decrypted **inside the Worker** on boot via:

```ts
// apps/web/src/server.ts
import { loadAppEnv } from "@gen/env/runtime/edge";

const appEnv = process.env.APP_ENV ?? process.env.STAGE ?? "dev";

if (process.env.SOPS_AGE_KEY) {
await loadAppEnv("web", appEnv, { inject: true });
}
```

The `@gen/env` package gains a new `./runtime/edge` export that maps
to `loader.ts` (the edge-safe loader). The existing `./runtime`
export — backed by `node-loader.ts` — keeps its FileSystem +
ChildProcessSpawner dependencies for use from `apps/*/alchemy.run.ts`
and other Node/Bun entrypoints.

Two changes complement the wiring:

1. **`@stackpanel/auth` is now lazy.** The `betterAuth({...})` call is
moved into a `buildAuth()` function called by a `Proxy`-backed
`auth` export. The first property access on `auth` builds and
caches the instance. This guarantees that if the import chain
`routeTree.gen.ts → routes/api/trpc.$.ts → @stackpanel/auth`
resolves before the SSR entrypoint's top-level `await loadAppEnv`
fires (which can happen depending on bundler module ordering),
`betterAuth` is *not* called yet — and by the time the request
handler actually touches `auth.api`, the env load is complete.

2. **The web Worker env in `apps/web/alchemy.run.ts` shrinks.** It
keeps `DATABASE_URL` (a runtime-bound resource output from the
Neon project, not a SOPS payload entry), and adds `SOPS_AGE_KEY`
and `APP_ENV`. The five forwarded secrets from commit `21c00841`
are removed.

Adding a new application secret going forward requires only:

1. A `sops:` entry in `.stack/config.apps.nix:envs.shared` (or the
relevant scope) — i.e., one Nix file edit.
2. A re-run of `stackpanel codegen build` to refresh the embedded
payload.

The new variable is automatically available on `process.env` inside
the Worker after the loader runs. No changes to `apps/web/alchemy.run.ts`,
no Cloudflare secret to provision, no per-environment dual-write.

## Consequences

**Pros**

- **Single source of truth.** Secrets are declared in Nix and embedded
in the codegen payload. Adding a secret is a one-place change.
- **No dual-write.** No more "remember to also add this to
`alchemy.run.ts`" trap.
- **Encrypted at rest until first request.** The Worker bundle ships
with SOPS ciphertext, not cleartext secrets; the AGE key is the only
cleartext-equivalent material in the Worker's secret store.
- **Smaller Cloudflare secret-store surface.** Only `SOPS_AGE_KEY` (+
`DATABASE_URL`, which is a per-deploy resource, not a SOPS secret)
needs to be a Worker secret. Previously every new secret added a new
Worker secret entry per stage.
- **Mirrors the Fly-deployed `apps/api`.** The api app already loads
its env via `loadAppEnv` at boot (in `apps/api/src/index.ts`'s
upstream chain); the web Worker now follows the same pattern.

**Cons**

- **Cold-start cost.** The first request to a new Worker isolate pays
the SOPS decrypt cost (one ChaCha20-Poly1305 decrypt per encrypted
field, plus the AGE X25519 key derivation, ~tens of milliseconds for
the current ~5-secret payload). Subsequent requests on the same
isolate hit the in-memory cache in `loader.ts`. At our scale this is
invisible. If we ever embed kilobyte-class secrets in the payload,
revisit.
- **`SOPS_AGE_KEY` rotation now happens via the deploy scope only.**
The CI workflow's `SECRETS_AGE_KEY_DEV` GitHub secret is the rotation
target; rotating it requires a redeploy because the Worker reads it
from the env binding set by `apps/web/alchemy.run.ts`, not from a
Cloudflare secret store rotation. Trade-off accepted: rotations are
rare and the deploy-scope rotation path is well-trodden (see
`.github/workflows/secrets-codegen-check.yml`).
- **Every consumer of `@stackpanel/auth` now goes through a Proxy.**
The Proxy is transparent for the property accesses better-auth and
our consumers actually do (`auth.api.getSession`, `auth.handler`,
etc.) but it's a small layer to keep in mind when debugging.

**Follow-ups / runbook**

- The `@gen/env` codegen drift gate (`.github/workflows/secrets-codegen-check.yml`)
remains the canary for "someone edited a SOPS file but forgot to
re-run codegen". This ADR doesn't change that workflow.
- Document `APP_ENV` as a load-bearing Worker env in
`.stack/data/apps.web.env.nix` once the codegen surfaces non-secret
defaults the same way it surfaces secrets.

## Alternatives considered

- **Forward secrets via `Cloudflare.Vite({ env: { ... } })` (commit
`21c00841`)** — rejected: dual-write, duplicates secret material,
bypasses `@gen/env` codegen.
- **Call `loadAppEnv(...)` inside each tRPC handler** — rejected:
redundant decrypt cost on every request and no benefit over a single
module-level decrypt cached for the isolate's lifetime.
- **Use Cloudflare KV / Secrets Store directly** — rejected: would
require a separate sync pipeline alongside SOPS, and Cloudflare's
per-secret API has its own rate-limit ceiling that we'd hit on every
deploy that touches a payload.
- **Make `@stackpanel/auth` synchronous via a Layer/Effect injection
pattern** — rejected as scope-creep. The Proxy-backed lazy singleton
is a 30-line change with no consumer-side migration; an Effect-shape
refactor is a separate, larger change.

## References

- Parent commit `8a7897c6` — wired `BETTER_AUTH_SECRET` and Polar
secrets through `.stack/config.apps.nix` so the codegen embeds real
ciphertext into each per-stage payload.
- Reverted commit `21c00841` — the rejected env-shovel approach.
- Edge-safe loader: `nix/stackpanel/lib/codegen/templates/env/loader.ts`.
- Codegen export wiring: `nix/stackpanel/lib/codegen/env-package.nix`
(`./runtime/edge` export).
- Web Worker entrypoint: `apps/web/src/server.ts`.
- Web deploy script: `apps/web/alchemy.run.ts`.
- Lazy auth: `packages/auth/src/index.ts`.
- bd issue: `stackpanel-3tj`.
Loading
Loading