Skip to content

feat(egress): contained-mode egress-jail topology + tinyproxy sidecar (PR 2/3)#155

Merged
jraicr merged 3 commits into
devfrom
feat/phase2-egress-2
Jun 5, 2026
Merged

feat(egress): contained-mode egress-jail topology + tinyproxy sidecar (PR 2/3)#155
jraicr merged 3 commits into
devfrom
feat/phase2-egress-2

Conversation

@jraicr

@jraicr jraicr commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Part of #149 · v0.4.0 · dual-mode-containment Phase 2 (egress jail) · PR 2/3 (stacked-to-dev)

PR 1/3 (#154, on dev) shipped the enabling machinery (sidecar image, baseline allowlist, per-session filter generation) with docker-compose.contain.yml deliberately unchanged. This PR flips the contained-mode network topology so that machinery is actually consumed: the agent loses its direct external route and egresses only through a deny-by-default tinyproxy sidecar (Architecture A — zero new Linux capabilities).

dev stays runnable after this merge: PR 1/3 already provides the effective filter file, the drydock-egress image build, and the DRYDOCK_SIDECAR_NAME export, so contained drydock run brings up the topology end-to-end. dood mode is untouched.

Topology (the flip)

  • docker-compose.contain.yml rewritten in place (no new overlay — the "exactly one of dood/contain" gate and its render test stay intact):
    • drydock_internal — fixed-name internal: true bridge (no gateway/NAT; external DNS SERVFAILs). The agent is solely attached here.
    • drydock_egress — fixed-name standard NAT bridge; the sidecar's external leg.
    • Both networks carry an explicit name: so all sessions share one pair (drydock tears down with docker rm -f, never compose down / network rm, so un-named networks would leak and exhaust Docker's ~31-bridge pool).
  • Agent (drydock service): HTTPS_PROXY/HTTP_PROXY/NO_PROXY (sidecar by name + loopback + metadata link-local), depends_on: drydock-egress: { condition: service_healthy } (fail-closed — the agent will not start before the proxy is healthy), and telemetry off by default (DISABLE_TELEMETRY, DISABLE_ERROR_REPORTING, CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC, all user-overridable) to collapse the required allowlist.
  • Sidecar (drydock-egress service): per-session container_name=${DRYDOCK_SIDECAR_NAME} (mirrors the agent's disc-parameterized name), image: drydock-egress:latest, dual-homed to both bridges, cap_drop: [ALL] with no cap_add, no-new-privileges, read_only rootfs with tmpfs: [/run, /tmp] (the PidFile lives in /tmp), a /dev/tcp port-8888 healthcheck, and the per-session effective filter RO-mounted at /etc/tinyproxy/filter.

Lifecycle, build, reporting

Area Change
ensure_image Requires drydock-egress:latest only when the project resolves to contained mode. dood must not require/build it — a dood-only offline host with the agent image present would otherwise be blocked by the sidecar's apt build failing without network.
collision-retry The discriminator collision regex is widened to (-shell|-egress)? so a disc whose sidecar still lingers after a run --rm reap-gap is not reused. The gc/seed liveness regexes are deliberately not widened (a live sidecar must not protect an orphaned dir).
gc reap gc_orphan_session_dirs reaps an orphaned -egress sidecar in the orphan branch (the run --rm paths remove only the oneoff agent and exec away).
teardown _run_claude_lifecycle and cmd_stop tear down the agent and its sidecar. cmd_stop guards the sidecar rm separately so a dood / already-reaped session never fails the stop.
banner / doctor The creation banner and drydock doctor now describe the egress allowlist (with a live domain count) and the contain pin note is updated; a new doctor EGRESS section reports the baseline + user allowlists + sidecar image. Honest wording — the allowlist is named without over-claiming.

Residual gaps (named, not hidden)

L4 only (payload not inspected); ECH / domain-fronting; IP-literal CONNECT denied by deny-by-default; a compromised sidecar means open egress; a tool ignoring HTTPS_PROXY loses egress (breakage, not bypass); concurrent contained sessions share the one internal bridge and can reach a peer's sidecar (single-user trust bound). These are enumerated in full in the doctrine flip (PR 3/3).

Invariants

  • INV-8 touched-but-preserved: zero new Linux capabilities anywhere. The sidecar runs cap_drop: [ALL] with zero cap_add; the agent's cap_add is the unchanged Phase 1 hardening set. A render test scoped to the sidecar service block asserts the absence of cap_add (non-vacuous — cap_add is present on the agent in that same render).
  • dood unaffected: dood render contains docker.sock + network_mode: host and none of internal: true, the sidecar, HTTPS_PROXY, the filter mount, or the telemetry-off defaults (negative render tests guard this).

Tests (strict TDD)

  • 1137/1137 green via scripts/test.sh; shellcheck + shfmt -d clean.
  • The first failing test inverts the Phase 1 internal: true absent guard; the two drydock_net render tests are replaced (not left dangling). New coverage: R9.2–R9.13 render assertions, dood negatives, sidecar read_only/tmpfs (A5) scoped to the service block, -egress collision retry, gc sidecar reap (+ a live-agent-still-protects negative), ensure_image contained-builds / dood-skips, two-name teardown (lifecycle + stop), and banner/doctor egress reporting.

Review

Fresh adversarial review (empirical — compose renders, mutation testing of the new assertions, A6 regex replay, teardown exit-code semantics, set -euo pipefail safety, healthcheck validated against the real debian:12-slim) plus an independent spec/design verification: 0 blockers. Two test-sensitivity findings (the A5 read_only/tmpfs assertions could pass vacuously) were fixed before this PR by scoping them to the sidecar block and confirming they fail against a mutated overlay.

Size

~690 lines changed — the majority are tests; production code is ~185 lines (compose ~90, lib/ ~95). Labeled size:l.

Next

  • PR 3/3 — doctrine + docs flip: INV-9 rewrite (drop the "does NOT filter egress" clause; name all residual gaps), docs/security.md, README.md, docs/architecture.md, docs/troubleshooting.md (egress-allowlist how-to), docs/ROADMAP.md → Done, CHANGELOG.md (v0.4.0).
  • Release gates (v0.4.0): empirical baseline CONNECT capture + a runtime double-gate smoke (reach an allowlisted domain through the proxy and block a non-allowlisted one, IP-literal CONNECT denied) on bare Linux.

Part of #149 — the umbrella issue stays open until Phase 2 plus the v0.4.0 release.

jraicr added 3 commits June 5, 2026 14:32
Flip docker-compose.contain.yml to the Phase 2 dual-bridge topology: the agent
attaches solely to a fixed-name internal:true bridge (drydock_internal, no
gateway/NAT) and a per-session tinyproxy sidecar (drydock-egress, consuming the
slice-1 drydock-egress:latest image + DRYDOCK_EGRESS_FILTER_FILE) is dual-homed
to a second fixed-name NAT bridge (drydock_egress). The agent egresses ONLY via
HTTPS_PROXY/NO_PROXY to the sidecar; telemetry is OFF by default (three vars,
all user-overridable). depends_on service_healthy gates the agent on the proxy.

Sidecar hardening: cap_drop ALL (zero cap_add), no-new-privileges, read_only
with tmpfs /run+/tmp (PidFile lives in /tmp), filter RO-mounted. INV-8 intact —
no new Linux capability anywhere; the agent's cap_add is the unchanged Phase 1
hardening set. dood overlay UNAFFECTED.

Render tests invert the Phase 1 guards (internal:true absent->present,
drydock_net->drydock_internal/drydock_egress) and add R9.2-R9.13: sidecar
present + dual-homed, HTTPS_PROXY/NO_PROXY, depends_on service_healthy, filter
RO mount, cap_drop, no-cap_add (scoped to the sidecar block so the agent's
cap_add cannot make it vacuous), healthcheck, three telemetry vars, fixed
non-namespaced network names, and dood negatives. setup() exports the two :?
guarded vars so contain renders succeed offline.
Wire the sidecar's lifecycle, mode-aware image build, and honest reporting:

- ensure_image (lib/compose.sh): require drydock-egress:latest ONLY when the
  project resolves to contained mode (resolve_run_mode is a pure fn of env +
  sentinels). dood MUST NOT require/build it — a dood-only offline host with the
  agent image present would otherwise be blocked by cmd_build's apt step failing
  without network (dood regression, INV-9 dood-unaffected).
- collision-retry regex (export_compose_env): widen to (-shell|-egress)? so a
  disc whose -egress sidecar lingers after a run --rm reap-gap is not reused
  before gc clears it (A6). The gc liveness regex is deliberately NOT widened —
  a live sidecar must not protect an orphaned dir.
- gc_orphan_session_dirs: reap an orphaned -egress sidecar in the orphan branch
  (the run --rm paths remove only the oneoff agent and exec away).
- two-name teardown: _run_claude_lifecycle TRAP A and cmd_stop also rm -f the
  -egress sidecar. cmd_stop guards the sidecar rm separately (2>/dev/null||true)
  so a dood/already-reaped session does not fail the stop.
- _emit_mode_banner / cmd_doctor / drydock contain note: replace the Phase 1
  'egress open' wording with the egress-jail posture (allowlist + domain count);
  add a doctor EGRESS section (baseline/user allowlists + sidecar image). Honest
  framing — no 'egress-proof' over-claim.

Tests: gc sidecar reap (+ live-agent-still-protects negative for A6),
-egress collision retry, ensure_image contained-builds / dood-skips, two-name
teardown (lifecycle + stop), banner egress line + domain count + dood-silent,
doctor EGRESS section. Full suite 1135 green; shellcheck + shfmt clean.
Adversarial review flagged two test-sensitivity gaps in the A5 sidecar-hardening
contract: the only read_only assertion (R9.5) also matched the filter volume's
read_only: true, so dropping the service-level rootfs flag went uncaught; and the
sidecar tmpfs (/run + /tmp) had no assertion at all. Add two render tests scoped
to the drydock-egress service block: service-level read_only (exact 4-space
indent, not the 8-space volume key) and tmpfs /run + /tmp (whole-line matches so
the filter volume's /tmp/... source cannot satisfy them vacuously). Both verified
to fail against a mutated overlay with the flags stripped.
@jraicr jraicr added type:feat Feature work size:l Large: 400+ lines labels Jun 5, 2026
@jraicr jraicr merged commit 90e1ecb into dev Jun 5, 2026
4 checks passed
@jraicr jraicr deleted the feat/phase2-egress-2 branch June 5, 2026 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:l Large: 400+ lines type:feat Feature work

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant