Skip to content

feat: WOR-114 Phase 2 + manual.md gaps + mTLS/OCSP#42

Open
rickcrawford wants to merge 3 commits intomainfrom
feat/wor-114-mtls-ocsp
Open

feat: WOR-114 Phase 2 + manual.md gaps + mTLS/OCSP#42
rickcrawford wants to merge 3 commits intomainfrom
feat/wor-114-mtls-ocsp

Conversation

@rickcrawford
Copy link
Copy Markdown
Contributor

Two commits between v1.0.1 and HEAD on the feat/wor-114-mtls-ocsp branch.
The v1.0.0/v1.0.1 release work shipped via direct push; this PR puts the
post-1.0.1 work behind a review gate before tagging v1.0.2.

Summary

Commit 1: 36aabc3 WOR-114 Phase 2 + manual.md gap-fix bundle

WOR-114 Phase 2: CEL features[...] namespace

  • New FeatureFlagsView and populate_features_namespace in
    sbproxy-extension/src/cel/context.rs
  • features.debug / features.trace / features[\"no-cache\"] /
    features.any_set exposed to CEL
  • Free-form k=v pairs from x-sb-flags extras land as strings
  • Wired into the rate-limit-key evaluator (server.rs::rate_limit_key_from_cel)
    and the main ExpressionPolicy::evaluate_with_views
  • 4 new unit tests in cel::context::tests
  • manual.md §10 rewritten to match the surface

manual.md gap items

  • SB_WORKER_THREADS env var: positive integer overrides
    available_parallelism() for the Pingora worker pool
  • /live, /livez, /ready, /health endpoints added as aliases
    for /livez / /readyz / /healthz. New handle_livez returns
    {\"alive\":true} always
  • §6 Health checks rewritten to match shipping behavior
  • §13 env-var table picks up SB_WORKER_THREADS and SB_DISABLE_SB_FLAGS

Commit 2: bbc0b97 mTLS into ACME path + activate OCSP stapling

mTLS on the ACME path

  • The manual-cert path already wired mTLS via build_mtls_tls_settings.
    The ACME path silently dropped it; an operator who configured mtls:
    alongside acme: got plain TLS. Fixed.

OCSP stapling

  • OcspStapler previously existed but was unwired
  • start_refresh_task now does an immediate fetch + 12h refresh loop
    with an on_update callback
  • CertResolver::update_fallback_ocsp(bytes) swaps the fallback cert
    with one that has ocsp = Some(bytes) (the only mechanism rustls
    0.23 staples through)
  • TlsState::start_ocsp_refresh_task() invoked from
    crates/sbproxy-core/src/server.rs next to the ACME renewal task
  • 2 new unit tests in cert_resolver::tests

Scope: manual fallback cert only; per-hostname ACME OCSP is a follow-up.

Test plan

  • cargo fmt --all -- --check green
  • cargo build --workspace green
  • cargo clippy --workspace --all-targets -- -D warnings green
  • RUSTDOCFLAGS=\"-D warnings\" cargo doc --workspace --no-deps --document-private-items green
  • cargo test --workspace --release --tests (115 buckets ok)
  • e2e: sb_flags_e2e (4 tests) covers debug headers + no-cache header + no-cache query
  • e2e: existing auth_jwt (9 tests) still pass after the jsonwebtoken 10 + aws_lc_rs swap
  • curl install verified live: curl -fsSL https://download.sbproxy.dev | sh installs sbproxy 1.0.1

Out of scope for this PR

Already filed as Linear tickets:

  • WOR-125 two-level log config
  • WOR-126 slow_request_threshold
  • WOR-127 access-log file rotation
  • WOR-128 full /health schema with version+uptime+component_checks

Linear

Closes WOR-114 (Phase 2 of the x-sb-flags
implementation; Phase 1 closed in v1.0.0).

Three independent threads, all in one commit because they share the
manual.md edit and the test infrastructure:

WOR-114 Phase 2: CEL features namespace
---------------------------------------
* New `FeatureFlagsView` and `populate_features_namespace` in
  sbproxy-extension/src/cel/context.rs.
* `features.debug` / `features.trace` / `features["no-cache"]` /
  `features.any_set` exposed to CEL expressions, plus extra k=v
  pairs from the `x-sb-flags` header / query-param prefix.
* Wired through the `RequestContext.flags` populated in Phase 1:
  rate-limit-key CEL evaluator (server.rs::rate_limit_key_from_cel)
  and `ExpressionPolicy::evaluate_with_views` (modules/policy/mod.rs).
* Extra k=v entries cannot shadow built-in bools (the apply rules
  in `sb_flags::apply` already gate this; the CEL populate code
  belt-and-braces the same rule).
* 4 new unit tests in cel::context::tests covering all three built-ins,
  bracket access for hyphenated keys, extras, and the shadow-prevention
  rule.
* manual.md §10 rewritten: one CEL example block tied to the actual
  policy schema, accessor table, kill-switch note. Lua / workspace-flag
  Phase-2 sections marked planned.

manual.md gap items
-------------------
* `SB_WORKER_THREADS` env var: positive integer overrides the
  auto-detected Pingora worker thread count. Documented in §3.
* `/live`, `/livez`, `/ready`, `/health` endpoints added to the admin
  surface as aliases for `/livez` / `/readyz` / `/healthz`. New
  `handle_livez` returns `{"alive":true}` always; the K8s liveness
  probe never trips on transient readiness failures.
* §6 Health checks rewritten to match what actually ships (3 endpoints
  with bare aliases, no per-endpoint URL fork). Body shapes match the
  current envelope rather than the Go-era schema.
* §13 env-var table picks up SB_WORKER_THREADS and SB_DISABLE_SB_FLAGS.

Out of scope: full /health schema with version+uptime+component_checks
(the current /healthz body is intentionally trivial; expanding it is a
separate ticket since it touches the registry serialisation).

CI gates: cargo fmt + build + clippy + rustdoc + tests all green
locally; e2e/sb_flags_e2e (4) + observe/health (1 new + existing) +
admin (39 incl. 2 new) + cel context (4 new) all pass.
Two YC-readiness items, both in the TLS layer.

mTLS on the ACME path
---------------------

The manual-cert path already wired mTLS via build_mtls_tls_settings
(server.rs:108). The ACME path silently dropped it: an operator who
configured `mtls:` alongside `acme:` got plain TLS until they noticed
clients reaching the upstream without the expected cert headers. Fix
the ACME branch to mirror the manual-cert branch: build the settings
from MtlsListenerConfig, wire the ClientCertVerifier, and fall back
to plain TLS only when mTLS setup itself fails. Same require/optional
semantics either way.

OCSP stapling
-------------

OcspStapler exists with full fetch + 12h refresh logic but was never
called from anywhere. Wire it:

* OcspStapler::start_refresh_task now does an immediate fetch before
  the 12h loop and takes an `on_update` callback so the resolver can
  rebuild its CertifiedKey with the fresh bytes. (rustls 0.23
  staples whatever lives in `CertifiedKey.ocsp`.)

* CertResolver::update_fallback_ocsp swaps the fallback cert with a
  copy that has `ocsp = Some(bytes)`. Hot resolve() path is untouched;
  only the writer pays the Vec<CertificateDer> + Arc<dyn SigningKey>
  clone, once per 12h refresh.

* TlsState gains `ocsp_stapler` + `manual_cert_pem` fields, populated
  in init() when a manual TLS cert was loaded. start_ocsp_refresh_task
  spawns the refresh, called from server.rs alongside the existing
  ACME renewal task.

Scope: manual fallback cert only. ACME-managed certs typically come
with OCSP responses inline via must-staple; wiring per-hostname OCSP
for ACME certs is a separate ticket.

CI gates: fmt + build + clippy + doc green; 92 sbproxy-tls tests pass
(13 cert_resolver incl. 2 new OCSP plumbing tests, 5 ocsp, plus the
existing acme/mtls suites).
Captures everything between v1.0.1 and the current branch HEAD so
the next version cut can lift it verbatim:

* WOR-114 Phase 2 CEL features namespace
* SB_WORKER_THREADS env var
* /live, /livez, /ready, /health endpoint aliases
* OCSP stapling for the manual fallback cert
* mTLS on the ACME path (was silently dropped)
* manual.md §6, §10, §3, §13 rewrites

No code change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant