fix(egress): pre-gate criticals — filter mode 0600, smoke network poisoning, unanchored user ERE#161
Merged
Merged
Conversation
The effective filter was written via mktemp (always 0600) and mv (mode preserved), then RO bind-mounted to /etc/tinyproxy/filter. tinyproxy in the sidecar runs as a non-root uid with cap_drop ALL (no DAC_OVERRIDE), so it could not read the filter: it logged 'filter file: Permission denied' and exited, the healthcheck never passed, and the agent's depends_on: service_healthy aborted every contained-mode run. chmod 644 before the move in both generators — _generate_egress_filter (lib/compose.sh) and _write_filter (scripts/egress-smoke.sh). The allowlist is non-secret data; world-readable is correct. Also move egress-smoke.sh's cleanup trap from top level into main(): a top-level 'trap _cleanup EXIT' fired on every sourcing shell's exit and ran 'docker rm -f' against the live daemon, which made the helpers untestable via the source-guard (the new test/egress_smoke.bats asserts sourcing is side-effect-free).
_check_preconditions pre-created the production fixed-name networks
(drydock_internal / drydock_egress) via 'docker network create' when
absent, and _cleanup deliberately never removed networks. A plain
'docker network create' attaches no compose labels, so a subsequent
contained 'docker compose up' referencing the same fixed name: fails
fatally ('network ... has incorrect label com.docker.compose.network
set to ""'). Since setup-gates.sh runs the G2 smoke before the G1
capture sessions, the gate host was poisoned for every later contained
run until a manual 'docker network rm'.
Fix: track exactly the networks THIS run creates
(SMOKE_CREATED_NETWORKS) and remove them in _cleanup — on success and
on the failure paths via the EXIT trap. This restores the host to its
prior state by construction. Pre-existing networks (e.g. owned by a
live contained session) are never touched. Creating the networks with
hand-built compose labels was rejected: compose v2 validates the label
set against its own project metadata, and an approximated label set
that diverges from a future compose version silently reintroduces the
poisoning — removal is the only guarantee that does not depend on
matching compose internals.
tinyproxy's FilterType ere matches UNANCHORED substrings, so a user
allowlist line in the documented bare-hostname form ('echo example.com
>> ~/.config/drydock/egress-allowlist') also matched
example.com.evil.io and example1com.net — silently widening the
deny-by-default filter. The shipped baseline was already anchored
(^api\.anthropic\.com$); only user additions were exposed.
_generate_egress_filter now normalizes every line after
comment-stripping and before dedup, uniformly across all three sources
(baseline, global user file, per-project user file):
- a bare hostname (alphanumerics, dots, hyphens) is escaped and
anchored exactly: example.com -> ^example\.com$
- any other line passes through verbatim as raw ERE (expert escape
hatch; the baseline's anchored lines are untouched), so
'example.com' and '^example\.com$' dedup to one line
Same transform as _to_ere_baseline_pattern in scripts/egress-capture.sh,
replicated because lib/ must not source scripts/. Documented in
docs/troubleshooting.md and the baseline header. Also closes a coverage
gap: the per-project allowlist file (egress-allowlist-<project>) now has
a test proving it feeds the generated filter.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes three release-blocking defects in the contained-mode egress jail found by a pre-gate audit of the v0.4.0 scope (
devvsmain), each empirically verified against the realdrydock-egressimage, plus a missing CHANGELOG entry for #159. Part of #149 (release gates G1/G2 should run on top of these fixes).1. Generated egress filter was mode 0600 — sidecar could not read it (
9bdd344)_generate_egress_filter(lib/compose.sh) wrote the filter viamktemp(always 0600) andmv -fpreserves the mode. The file is bind-mounted:rointo the sidecar, where tinyproxy runs as uid 100 withcap_drop: ALL(noDAC_OVERRIDE): tinyproxy printedfilter file: Permission deniedand exited, the healthcheck never passed, anddepends_on: service_healthyaborted every contained-mode run. The identical bug existed in_write_filter(scripts/egress-smoke.sh), so gate G2 died with a misleading "sidecar did not become healthy". Fix:chmod 644before the file reaches its final path, in both places (the filter is non-secret allowlist data). New bats tests assert the 644 mode through the real code paths.2. Smoke run pre-created production networks without compose labels (
5a4571b)egress-smoke.shcreateddrydock_internal/drydock_egressviadocker network createwhen absent and never removed them. A laterdocker compose upreferencing the same fixedname:fails fatally (network ... has incorrect label com.docker.compose.network). Sincesetup-gates.shruns G2 before the G1 capture sessions, the gate host was left persistently broken for every subsequent contained run. Fix: the script now tracks exactly the networks it created (SMOKE_CREATED_NETWORKS) and_cleanup(EXIT trap, covers failure paths and SIGINT) removes exactly those — pre-existing networks are never touched, and the host returns to its prior state by construction. Hand-built compose labels were rejected: compose v2 validatescom.docker.compose.networkagainst its own project metadata, so an approximated label set could silently re-poison under a future compose version. The EXIT trap moved from top level intomain()so sourcing the script (bats) registers no trap. Newtest/egress_smoke.batscovers created-only cleanup, pre-existing preservation, and trap registration.3. Bare-hostname user allowlist entries were unanchored ERE (
b99dcee,ad40664)tinyproxy matches
FilterType erepatterns as unanchored substrings: the documented user-extension form (echo "example.com" >> ~/.config/drydock/egress-allowlist) produced a pattern that also matchedexample.com.evil.ioandexample1com.net— the classic exfil-subdomain shape, undermining the jail's deny-by-default intent for exactly the external-ingestion sessions it targets (INV-9). Fix: a new_normalize_egress_patternsstep between comment-strip and dedup anchors any line consisting only of letters, digits, dots, and hyphens as an exact escaped match (example.com→^example\.com$); any other line passes through verbatim as raw ERE (expert escape hatch — shipped baseline lines are already anchored and pass byte-identical;egress-capture.sh enable's.*allow-all is preserved). Applied uniformly to baseline, global, and per-project sources so bare and anchored duplicates dedup to one line. Docs and baseline header state the exact rule. New tests cover anchoring, verbatim passthrough, baseline stability, dedup collapse, and (previously untested) per-project allowlist inclusion.CHANGELOG (
fc86c24)Adds the missing user-facing
[Unreleased]entry for the Docker-daemon fail-fast probe (#159,DRYDOCK_DOCKER_PROBE_TIMEOUT) and aFixedsection for the three items above.Invariant impact
:robind-mounted plain file, now world-readable (non-secret).Verification
scripts/test.sh: 1177/1177 pass (includes the new tests; each fix was implemented test-first).shellcheck,shfmt -d,scripts/lint-commits.sh: clean.ad40664.