fix(egress): pre-gate criticals — filter mode 0600, smoke network poisoning, unanchored user ERE by jraicr · Pull Request #161 · sideralith/drydock

jraicr · 2026-06-11T00:07:05Z

Summary

Fixes three release-blocking defects in the contained-mode egress jail found by a pre-gate audit of the v0.4.0 scope (dev vs main), each empirically verified against the real drydock-egress image, plus a missing CHANGELOG entry for #159. Part of #149 (release gates G1/G2 should run on top of these fixes).

1. Generated egress filter was mode 0600 — sidecar could not read it (`9bdd344`)

_generate_egress_filter (lib/compose.sh) wrote the filter via mktemp (always 0600) and mv -f preserves the mode. The file is bind-mounted :ro into the sidecar, where tinyproxy runs as uid 100 with cap_drop: ALL (no DAC_OVERRIDE): tinyproxy printed filter file: Permission denied and exited, the healthcheck never passed, and depends_on: service_healthy aborted every contained-mode run. The identical bug existed in _write_filter (scripts/egress-smoke.sh), so gate G2 died with a misleading "sidecar did not become healthy". Fix: chmod 644 before the file reaches its final path, in both places (the filter is non-secret allowlist data). New bats tests assert the 644 mode through the real code paths.

2. Smoke run pre-created production networks without compose labels (`5a4571b`)

egress-smoke.sh created drydock_internal/drydock_egress via docker network create when absent and never removed them. A later docker compose up referencing the same fixed name: fails fatally (network ... has incorrect label com.docker.compose.network). Since setup-gates.sh runs G2 before the G1 capture sessions, the gate host was left persistently broken for every subsequent contained run. Fix: the script now tracks exactly the networks it created (SMOKE_CREATED_NETWORKS) and _cleanup (EXIT trap, covers failure paths and SIGINT) removes exactly those — pre-existing networks are never touched, and the host returns to its prior state by construction. Hand-built compose labels were rejected: compose v2 validates com.docker.compose.network against its own project metadata, so an approximated label set could silently re-poison under a future compose version. The EXIT trap moved from top level into main() so sourcing the script (bats) registers no trap. New test/egress_smoke.bats covers created-only cleanup, pre-existing preservation, and trap registration.

3. Bare-hostname user allowlist entries were unanchored ERE (`b99dcee`, `ad40664`)

tinyproxy matches FilterType ere patterns as unanchored substrings: the documented user-extension form (echo "example.com" >> ~/.config/drydock/egress-allowlist) produced a pattern that also matched example.com.evil.io and example1com.net — the classic exfil-subdomain shape, undermining the jail's deny-by-default intent for exactly the external-ingestion sessions it targets (INV-9). Fix: a new _normalize_egress_patterns step between comment-strip and dedup anchors any line consisting only of letters, digits, dots, and hyphens as an exact escaped match (example.com → ^example\.com$); any other line passes through verbatim as raw ERE (expert escape hatch — shipped baseline lines are already anchored and pass byte-identical; egress-capture.sh enable's .* allow-all is preserved). Applied uniformly to baseline, global, and per-project sources so bare and anchored duplicates dedup to one line. Docs and baseline header state the exact rule. New tests cover anchoring, verbatim passthrough, baseline stability, dedup collapse, and (previously untested) per-project allowlist inclusion.

CHANGELOG (`fc86c24`)

Adds the missing user-facing [Unreleased] entry for the Docker-daemon fail-fast probe (#159, DRYDOCK_DOCKER_PROBE_TIMEOUT) and a Fixed section for the three items above.

Invariant impact

INV-9 (egress jail): strengthened, not softened. The deny-by-default allowlist rule is unchanged; add-only semantics are preserved (normalization never removes baseline lines; dedup keeps the baseline copy). Fix 1 makes the documented jail actually start; fix 3 makes user extensions match exactly instead of as substrings.
INV-8: untouched — no capability, mount, or hardening change; the filter remains a single :ro bind-mounted plain file, now world-readable (non-secret).
All other invariants: not touched.

Verification

Full suite via scripts/test.sh: 1177/1177 pass (includes the new tests; each fix was implemented test-first).
shellcheck, shfmt -d, scripts/lint-commits.sh: clean.
Fresh-context adversarial review verdict: READY-WITH-NITS; both nits (doc-precision) addressed in ad40664.

The effective filter was written via mktemp (always 0600) and mv (mode preserved), then RO bind-mounted to /etc/tinyproxy/filter. tinyproxy in the sidecar runs as a non-root uid with cap_drop ALL (no DAC_OVERRIDE), so it could not read the filter: it logged 'filter file: Permission denied' and exited, the healthcheck never passed, and the agent's depends_on: service_healthy aborted every contained-mode run. chmod 644 before the move in both generators — _generate_egress_filter (lib/compose.sh) and _write_filter (scripts/egress-smoke.sh). The allowlist is non-secret data; world-readable is correct. Also move egress-smoke.sh's cleanup trap from top level into main(): a top-level 'trap _cleanup EXIT' fired on every sourcing shell's exit and ran 'docker rm -f' against the live daemon, which made the helpers untestable via the source-guard (the new test/egress_smoke.bats asserts sourcing is side-effect-free).

_check_preconditions pre-created the production fixed-name networks (drydock_internal / drydock_egress) via 'docker network create' when absent, and _cleanup deliberately never removed networks. A plain 'docker network create' attaches no compose labels, so a subsequent contained 'docker compose up' referencing the same fixed name: fails fatally ('network ... has incorrect label com.docker.compose.network set to ""'). Since setup-gates.sh runs the G2 smoke before the G1 capture sessions, the gate host was poisoned for every later contained run until a manual 'docker network rm'. Fix: track exactly the networks THIS run creates (SMOKE_CREATED_NETWORKS) and remove them in _cleanup — on success and on the failure paths via the EXIT trap. This restores the host to its prior state by construction. Pre-existing networks (e.g. owned by a live contained session) are never touched. Creating the networks with hand-built compose labels was rejected: compose v2 validates the label set against its own project metadata, and an approximated label set that diverges from a future compose version silently reintroduces the poisoning — removal is the only guarantee that does not depend on matching compose internals.

tinyproxy's FilterType ere matches UNANCHORED substrings, so a user allowlist line in the documented bare-hostname form ('echo example.com >> ~/.config/drydock/egress-allowlist') also matched example.com.evil.io and example1com.net — silently widening the deny-by-default filter. The shipped baseline was already anchored (^api\.anthropic\.com$); only user additions were exposed. _generate_egress_filter now normalizes every line after comment-stripping and before dedup, uniformly across all three sources (baseline, global user file, per-project user file): - a bare hostname (alphanumerics, dots, hyphens) is escaped and anchored exactly: example.com -> ^example\.com$ - any other line passes through verbatim as raw ERE (expert escape hatch; the baseline's anchored lines are untouched), so 'example.com' and '^example\.com$' dedup to one line Same transform as _to_ere_baseline_pattern in scripts/egress-capture.sh, replicated because lib/ must not source scripts/. Documented in docs/troubleshooting.md and the baseline header. Also closes a coverage gap: the per-project allowlist file (egress-allowlist-<project>) now has a test proving it feeds the generated filter.

jraicr added 5 commits June 10, 2026 23:49

docs(changelog): daemon fail-fast (#159) and egress audit fixes (#149)

fc86c24

docs(egress): state the exact bare-hostname anchoring rule

ad40664

jraicr added type:fix Bug fix size:m Medium: 100-400 lines labels Jun 11, 2026

jraicr merged commit 1ff437b into dev Jun 11, 2026
4 checks passed

jraicr deleted the fix/egress-audit-criticals branch June 11, 2026 08:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(egress): pre-gate criticals — filter mode 0600, smoke network poisoning, unanchored user ERE#161

fix(egress): pre-gate criticals — filter mode 0600, smoke network poisoning, unanchored user ERE#161
jraicr merged 5 commits into
devfrom
fix/egress-audit-criticals

jraicr commented Jun 11, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jraicr commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

1. Generated egress filter was mode 0600 — sidecar could not read it (9bdd344)

2. Smoke run pre-created production networks without compose labels (5a4571b)

3. Bare-hostname user allowlist entries were unanchored ERE (b99dcee, ad40664)

CHANGELOG (fc86c24)

Invariant impact

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jraicr commented Jun 11, 2026 •

edited

Loading

1. Generated egress filter was mode 0600 — sidecar could not read it (`9bdd344`)

2. Smoke run pre-created production networks without compose labels (`5a4571b`)

3. Bare-hostname user allowlist entries were unanchored ERE (`b99dcee`, `ad40664`)

CHANGELOG (`fc86c24`)