feat(egress): contained-mode egress-jail topology + tinyproxy sidecar (PR 2/3)#155
Merged
Conversation
Flip docker-compose.contain.yml to the Phase 2 dual-bridge topology: the agent attaches solely to a fixed-name internal:true bridge (drydock_internal, no gateway/NAT) and a per-session tinyproxy sidecar (drydock-egress, consuming the slice-1 drydock-egress:latest image + DRYDOCK_EGRESS_FILTER_FILE) is dual-homed to a second fixed-name NAT bridge (drydock_egress). The agent egresses ONLY via HTTPS_PROXY/NO_PROXY to the sidecar; telemetry is OFF by default (three vars, all user-overridable). depends_on service_healthy gates the agent on the proxy. Sidecar hardening: cap_drop ALL (zero cap_add), no-new-privileges, read_only with tmpfs /run+/tmp (PidFile lives in /tmp), filter RO-mounted. INV-8 intact — no new Linux capability anywhere; the agent's cap_add is the unchanged Phase 1 hardening set. dood overlay UNAFFECTED. Render tests invert the Phase 1 guards (internal:true absent->present, drydock_net->drydock_internal/drydock_egress) and add R9.2-R9.13: sidecar present + dual-homed, HTTPS_PROXY/NO_PROXY, depends_on service_healthy, filter RO mount, cap_drop, no-cap_add (scoped to the sidecar block so the agent's cap_add cannot make it vacuous), healthcheck, three telemetry vars, fixed non-namespaced network names, and dood negatives. setup() exports the two :? guarded vars so contain renders succeed offline.
Wire the sidecar's lifecycle, mode-aware image build, and honest reporting: - ensure_image (lib/compose.sh): require drydock-egress:latest ONLY when the project resolves to contained mode (resolve_run_mode is a pure fn of env + sentinels). dood MUST NOT require/build it — a dood-only offline host with the agent image present would otherwise be blocked by cmd_build's apt step failing without network (dood regression, INV-9 dood-unaffected). - collision-retry regex (export_compose_env): widen to (-shell|-egress)? so a disc whose -egress sidecar lingers after a run --rm reap-gap is not reused before gc clears it (A6). The gc liveness regex is deliberately NOT widened — a live sidecar must not protect an orphaned dir. - gc_orphan_session_dirs: reap an orphaned -egress sidecar in the orphan branch (the run --rm paths remove only the oneoff agent and exec away). - two-name teardown: _run_claude_lifecycle TRAP A and cmd_stop also rm -f the -egress sidecar. cmd_stop guards the sidecar rm separately (2>/dev/null||true) so a dood/already-reaped session does not fail the stop. - _emit_mode_banner / cmd_doctor / drydock contain note: replace the Phase 1 'egress open' wording with the egress-jail posture (allowlist + domain count); add a doctor EGRESS section (baseline/user allowlists + sidecar image). Honest framing — no 'egress-proof' over-claim. Tests: gc sidecar reap (+ live-agent-still-protects negative for A6), -egress collision retry, ensure_image contained-builds / dood-skips, two-name teardown (lifecycle + stop), banner egress line + domain count + dood-silent, doctor EGRESS section. Full suite 1135 green; shellcheck + shfmt clean.
Adversarial review flagged two test-sensitivity gaps in the A5 sidecar-hardening contract: the only read_only assertion (R9.5) also matched the filter volume's read_only: true, so dropping the service-level rootfs flag went uncaught; and the sidecar tmpfs (/run + /tmp) had no assertion at all. Add two render tests scoped to the drydock-egress service block: service-level read_only (exact 4-space indent, not the 8-space volume key) and tmpfs /run + /tmp (whole-line matches so the filter volume's /tmp/... source cannot satisfy them vacuously). Both verified to fail against a mutated overlay with the flags stripped.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Part of #149 · v0.4.0 · dual-mode-containment Phase 2 (egress jail) · PR 2/3 (stacked-to-dev)
PR 1/3 (#154, on
dev) shipped the enabling machinery (sidecar image, baseline allowlist, per-session filter generation) withdocker-compose.contain.ymldeliberately unchanged. This PR flips the contained-mode network topology so that machinery is actually consumed: the agent loses its direct external route and egresses only through a deny-by-default tinyproxy sidecar (Architecture A — zero new Linux capabilities).devstays runnable after this merge: PR 1/3 already provides the effective filter file, thedrydock-egressimage build, and theDRYDOCK_SIDECAR_NAMEexport, so containeddrydock runbrings up the topology end-to-end. dood mode is untouched.Topology (the flip)
docker-compose.contain.ymlrewritten in place (no new overlay — the "exactly one of dood/contain" gate and its render test stay intact):drydock_internal— fixed-nameinternal: truebridge (no gateway/NAT; external DNS SERVFAILs). The agent is solely attached here.drydock_egress— fixed-name standard NAT bridge; the sidecar's external leg.name:so all sessions share one pair (drydock tears down withdocker rm -f, nevercompose down/network rm, so un-named networks would leak and exhaust Docker's ~31-bridge pool).drydockservice):HTTPS_PROXY/HTTP_PROXY/NO_PROXY(sidecar by name + loopback + metadata link-local),depends_on: drydock-egress: { condition: service_healthy }(fail-closed — the agent will not start before the proxy is healthy), and telemetry off by default (DISABLE_TELEMETRY,DISABLE_ERROR_REPORTING,CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC, all user-overridable) to collapse the required allowlist.drydock-egressservice): per-sessioncontainer_name=${DRYDOCK_SIDECAR_NAME}(mirrors the agent's disc-parameterized name),image: drydock-egress:latest, dual-homed to both bridges,cap_drop: [ALL]with nocap_add,no-new-privileges,read_onlyrootfs withtmpfs: [/run, /tmp](the PidFile lives in/tmp), a/dev/tcpport-8888 healthcheck, and the per-session effective filter RO-mounted at/etc/tinyproxy/filter.Lifecycle, build, reporting
ensure_imagedrydock-egress:latestonly when the project resolves to contained mode. dood must not require/build it — a dood-only offline host with the agent image present would otherwise be blocked by the sidecar'saptbuild failing without network.(-shell|-egress)?so a disc whose sidecar still lingers after arun --rmreap-gap is not reused. The gc/seed liveness regexes are deliberately not widened (a live sidecar must not protect an orphaned dir).gc_orphan_session_dirsreaps an orphaned-egresssidecar in the orphan branch (therun --rmpaths remove only the oneoff agent andexecaway)._run_claude_lifecycleandcmd_stoptear down the agent and its sidecar.cmd_stopguards the sidecarrmseparately so a dood / already-reaped session never fails the stop.drydock doctornow describe the egress allowlist (with a live domain count) and thecontainpin note is updated; a new doctor EGRESS section reports the baseline + user allowlists + sidecar image. Honest wording — the allowlist is named without over-claiming.Residual gaps (named, not hidden)
L4 only (payload not inspected); ECH / domain-fronting; IP-literal
CONNECTdenied by deny-by-default; a compromised sidecar means open egress; a tool ignoringHTTPS_PROXYloses egress (breakage, not bypass); concurrent contained sessions share the one internal bridge and can reach a peer's sidecar (single-user trust bound). These are enumerated in full in the doctrine flip (PR 3/3).Invariants
cap_drop: [ALL]with zerocap_add; the agent'scap_addis the unchanged Phase 1 hardening set. A render test scoped to the sidecar service block asserts the absence ofcap_add(non-vacuous —cap_addis present on the agent in that same render).docker.sock+network_mode: hostand none ofinternal: true, the sidecar,HTTPS_PROXY, the filter mount, or the telemetry-off defaults (negative render tests guard this).Tests (strict TDD)
scripts/test.sh;shellcheck+shfmt -dclean.internal: true absentguard; the twodrydock_netrender tests are replaced (not left dangling). New coverage: R9.2–R9.13 render assertions, dood negatives, sidecarread_only/tmpfs(A5) scoped to the service block,-egresscollision retry, gc sidecar reap (+ a live-agent-still-protects negative),ensure_imagecontained-builds / dood-skips, two-name teardown (lifecycle + stop), and banner/doctor egress reporting.Review
Fresh adversarial review (empirical — compose renders, mutation testing of the new assertions, A6 regex replay, teardown exit-code semantics,
set -euo pipefailsafety, healthcheck validated against the realdebian:12-slim) plus an independent spec/design verification: 0 blockers. Two test-sensitivity findings (the A5read_only/tmpfsassertions could pass vacuously) were fixed before this PR by scoping them to the sidecar block and confirming they fail against a mutated overlay.Size
~690 lines changed — the majority are tests; production code is ~185 lines (compose ~90,
lib/~95). Labeledsize:l.Next
docs/security.md,README.md,docs/architecture.md,docs/troubleshooting.md(egress-allowlist how-to),docs/ROADMAP.md→ Done,CHANGELOG.md(v0.4.0).CONNECTdenied) on bare Linux.Part of #149 — the umbrella issue stays open until Phase 2 plus the v0.4.0 release.