Skip to content

fix(egress): pre-gate criticals — filter mode 0600, smoke network poisoning, unanchored user ERE#161

Merged
jraicr merged 5 commits into
devfrom
fix/egress-audit-criticals
Jun 11, 2026
Merged

fix(egress): pre-gate criticals — filter mode 0600, smoke network poisoning, unanchored user ERE#161
jraicr merged 5 commits into
devfrom
fix/egress-audit-criticals

Conversation

@jraicr

@jraicr jraicr commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes three release-blocking defects in the contained-mode egress jail found by a pre-gate audit of the v0.4.0 scope (dev vs main), each empirically verified against the real drydock-egress image, plus a missing CHANGELOG entry for #159. Part of #149 (release gates G1/G2 should run on top of these fixes).

1. Generated egress filter was mode 0600 — sidecar could not read it (9bdd344)

_generate_egress_filter (lib/compose.sh) wrote the filter via mktemp (always 0600) and mv -f preserves the mode. The file is bind-mounted :ro into the sidecar, where tinyproxy runs as uid 100 with cap_drop: ALL (no DAC_OVERRIDE): tinyproxy printed filter file: Permission denied and exited, the healthcheck never passed, and depends_on: service_healthy aborted every contained-mode run. The identical bug existed in _write_filter (scripts/egress-smoke.sh), so gate G2 died with a misleading "sidecar did not become healthy". Fix: chmod 644 before the file reaches its final path, in both places (the filter is non-secret allowlist data). New bats tests assert the 644 mode through the real code paths.

2. Smoke run pre-created production networks without compose labels (5a4571b)

egress-smoke.sh created drydock_internal/drydock_egress via docker network create when absent and never removed them. A later docker compose up referencing the same fixed name: fails fatally (network ... has incorrect label com.docker.compose.network). Since setup-gates.sh runs G2 before the G1 capture sessions, the gate host was left persistently broken for every subsequent contained run. Fix: the script now tracks exactly the networks it created (SMOKE_CREATED_NETWORKS) and _cleanup (EXIT trap, covers failure paths and SIGINT) removes exactly those — pre-existing networks are never touched, and the host returns to its prior state by construction. Hand-built compose labels were rejected: compose v2 validates com.docker.compose.network against its own project metadata, so an approximated label set could silently re-poison under a future compose version. The EXIT trap moved from top level into main() so sourcing the script (bats) registers no trap. New test/egress_smoke.bats covers created-only cleanup, pre-existing preservation, and trap registration.

3. Bare-hostname user allowlist entries were unanchored ERE (b99dcee, ad40664)

tinyproxy matches FilterType ere patterns as unanchored substrings: the documented user-extension form (echo "example.com" >> ~/.config/drydock/egress-allowlist) produced a pattern that also matched example.com.evil.io and example1com.net — the classic exfil-subdomain shape, undermining the jail's deny-by-default intent for exactly the external-ingestion sessions it targets (INV-9). Fix: a new _normalize_egress_patterns step between comment-strip and dedup anchors any line consisting only of letters, digits, dots, and hyphens as an exact escaped match (example.com^example\.com$); any other line passes through verbatim as raw ERE (expert escape hatch — shipped baseline lines are already anchored and pass byte-identical; egress-capture.sh enable's .* allow-all is preserved). Applied uniformly to baseline, global, and per-project sources so bare and anchored duplicates dedup to one line. Docs and baseline header state the exact rule. New tests cover anchoring, verbatim passthrough, baseline stability, dedup collapse, and (previously untested) per-project allowlist inclusion.

CHANGELOG (fc86c24)

Adds the missing user-facing [Unreleased] entry for the Docker-daemon fail-fast probe (#159, DRYDOCK_DOCKER_PROBE_TIMEOUT) and a Fixed section for the three items above.

Invariant impact

  • INV-9 (egress jail): strengthened, not softened. The deny-by-default allowlist rule is unchanged; add-only semantics are preserved (normalization never removes baseline lines; dedup keeps the baseline copy). Fix 1 makes the documented jail actually start; fix 3 makes user extensions match exactly instead of as substrings.
  • INV-8: untouched — no capability, mount, or hardening change; the filter remains a single :ro bind-mounted plain file, now world-readable (non-secret).
  • All other invariants: not touched.

Verification

  • Full suite via scripts/test.sh: 1177/1177 pass (includes the new tests; each fix was implemented test-first).
  • shellcheck, shfmt -d, scripts/lint-commits.sh: clean.
  • Fresh-context adversarial review verdict: READY-WITH-NITS; both nits (doc-precision) addressed in ad40664.

jraicr added 5 commits June 10, 2026 23:49
The effective filter was written via mktemp (always 0600) and mv (mode
preserved), then RO bind-mounted to /etc/tinyproxy/filter. tinyproxy in
the sidecar runs as a non-root uid with cap_drop ALL (no DAC_OVERRIDE),
so it could not read the filter: it logged 'filter file: Permission
denied' and exited, the healthcheck never passed, and the agent's
depends_on: service_healthy aborted every contained-mode run.

chmod 644 before the move in both generators — _generate_egress_filter
(lib/compose.sh) and _write_filter (scripts/egress-smoke.sh). The
allowlist is non-secret data; world-readable is correct.

Also move egress-smoke.sh's cleanup trap from top level into main():
a top-level 'trap _cleanup EXIT' fired on every sourcing shell's exit
and ran 'docker rm -f' against the live daemon, which made the helpers
untestable via the source-guard (the new test/egress_smoke.bats asserts
sourcing is side-effect-free).
_check_preconditions pre-created the production fixed-name networks
(drydock_internal / drydock_egress) via 'docker network create' when
absent, and _cleanup deliberately never removed networks. A plain
'docker network create' attaches no compose labels, so a subsequent
contained 'docker compose up' referencing the same fixed name: fails
fatally ('network ... has incorrect label com.docker.compose.network
set to ""'). Since setup-gates.sh runs the G2 smoke before the G1
capture sessions, the gate host was poisoned for every later contained
run until a manual 'docker network rm'.

Fix: track exactly the networks THIS run creates
(SMOKE_CREATED_NETWORKS) and remove them in _cleanup — on success and
on the failure paths via the EXIT trap. This restores the host to its
prior state by construction. Pre-existing networks (e.g. owned by a
live contained session) are never touched. Creating the networks with
hand-built compose labels was rejected: compose v2 validates the label
set against its own project metadata, and an approximated label set
that diverges from a future compose version silently reintroduces the
poisoning — removal is the only guarantee that does not depend on
matching compose internals.
tinyproxy's FilterType ere matches UNANCHORED substrings, so a user
allowlist line in the documented bare-hostname form ('echo example.com
>> ~/.config/drydock/egress-allowlist') also matched
example.com.evil.io and example1com.net — silently widening the
deny-by-default filter. The shipped baseline was already anchored
(^api\.anthropic\.com$); only user additions were exposed.

_generate_egress_filter now normalizes every line after
comment-stripping and before dedup, uniformly across all three sources
(baseline, global user file, per-project user file):

  - a bare hostname (alphanumerics, dots, hyphens) is escaped and
    anchored exactly: example.com -> ^example\.com$
  - any other line passes through verbatim as raw ERE (expert escape
    hatch; the baseline's anchored lines are untouched), so
    'example.com' and '^example\.com$' dedup to one line

Same transform as _to_ere_baseline_pattern in scripts/egress-capture.sh,
replicated because lib/ must not source scripts/. Documented in
docs/troubleshooting.md and the baseline header. Also closes a coverage
gap: the per-project allowlist file (egress-allowlist-<project>) now has
a test proving it feeds the generated filter.
@jraicr jraicr added type:fix Bug fix size:m Medium: 100-400 lines labels Jun 11, 2026
@jraicr jraicr merged commit 1ff437b into dev Jun 11, 2026
4 checks passed
@jraicr jraicr deleted the fix/egress-audit-criticals branch June 11, 2026 08:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:m Medium: 100-400 lines type:fix Bug fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant