fix(cli): audit batch 2 — reaper abort, selector cancel, PROJECT_NAME trust, sync wedge, git-safety bypasses#163
Merged
Merged
Conversation
…reaper On a multi-user /tmp the reap glob matches other users' overlay files. kill -0 on a foreign live PID returns EPERM and is misread as dead, and rm -f on a sticky-/tmp file the current user does not own exits 1 as the last command of the || list — set -e then aborts dispatch before any command runs, including 'drydock version'. Skip files not owned by the current user (also fixes the EPERM misclassification) and guard both rm paths with '2>/dev/null || true' so an unremovable stale overlay is skipped, never fatal.
_select_choice_pure swallowed read failures (IFS= read -rsn1 || true), leaving key empty — indistinguishable from ENTER — so EOF'd stdin or a Ctrl-C-interrupted read SELECTED the highlighted option instead of cancelling. In cmd_run's menu the first option is Attach, so a cancel gesture attached to a session. A failed read now breaks the input loop with result 130, matching the cancel contract of the gum and fzf tiers.
PROJECT_NAME is a common CI/tooling export. cmd_run, cmd_attach,
cmd_list, cmd_stop, the session helpers (_live_sessions, _all_sessions,
_resolve_session_name), and ensure_image all read it via a
${PROJECT_NAME:-...} fallback at points where drydock had not set it,
so the only possible pre-existing value was the caller's environment.
The value flowed unsanitized into docker --filter and grep -E session
regexes: PROJECT_NAME='.*' made 'drydock stop' target every project's
sessions, and a foreign name steered ensure_image's mode resolution.
Entry points now derive the project unconditionally (sanitized basename
of the resolved project dir / cwd). The internal flow after
export_compose_env — which legitimately sets and exports PROJECT_NAME —
is unchanged.
Also anchors cmd_list's post-filter on the 4-hex discriminator so
project 'api' no longer lists 'api-docs' sessions ( -shell companion
rows are intentionally kept).
Two cmd_sync robustness gaps: 1. Sync before first setup: cmd_sync called only ensure_prereqs + ensure_image, then docker run with -v "$CONTAINER_CLAUDE":/dst:rw. On native Linux the daemon auto-creates a missing bind source as a ROOT-OWNED directory, wedging every later drydock command on EACCES until a manual sudo rm. cmd_sync now runs ensure_runtime_dirs first, matching cmd_run / cmd_shell and cmd_setup's header comment; a fresh-host sync bootstraps user-owned state via the idempotent cmd_setup (which never calls back into cmd_sync — no recursion). 2. Staging leak on failure: the docker/rsync staging invocation was a plain command, so under direct dispatch set -e aborted BEFORE rm -rf "$_staging", leaking a complete dereferenced ~/.claude copy (credentials included) under /tmp/drydock-sync.XXXXXX. The 'local _rsync_rc=$?' capture only worked on the errexit-suppressed auto-sync path. Now uses the reach-exit-guarded '|| _rsync_rc=$?' idiom (same as _run_claude_lifecycle) so cleanup and rc propagation run on both dispatch paths.
Three verified accident-class gaps, fixed at BOTH guardrail layers:
(a) Every 10-git-safety.json rule anchored on a literal 'git push' /
'git commit' / 'git branch' prefix, so the global-flag form
(git -C <path> push --force — the NORMAL way agents touch RW
siblings) matched nothing, and the hook had zero git logic.
(b) git commit -n / -nm "msg" (short --no-verify) was unblocked.
push -n means --dry-run and stays allowed.
(c) Force-push via +refspec (git push origin +main) was unblocked.
Deny layer: adds 'git -C *' mirrors of the existing destructive
patterns, plus 'git commit -n*' and 'git push * +*' entries.
Hook layer (robust catch-all): new G1-G4 rules scan per pipe-part —
the A2 precedent — for force-push (keeping the deny glob's existing
--force-with-lease stance), --no-verify and commit short -n (prefix
class excludes value-taking flags so 'git commit -mnote' cannot
false-positive), +refspec, and protected-branch deletion, regardless
of global-flag position. Quoted data stays masked by the ADR-9
quote-drop, so a commit message containing 'git push --force' is not
blocked.
cmd_attach now derives the project from the cwd instead of trusting an inherited PROJECT_NAME (audit batch 2), so the marker tests must run from a directory matching the stubbed project — the same harness pattern cmd_attach.bats and cmd_stop.bats already use.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Batch 2 (CLI robustness) of the full-codebase audit of the v0.3.3 feature set: seven fixes. ~250 lines of product code; the rest is TDD test coverage.
size:exceptionper the convention agreed for audit batches (batch 1: #162).F1 — startup orphan-overlay reaper no longer aborts every command (
85a11b4)The reap loop in
bin/drydockglobs/tmp/drydock-submounts-*.yml, which matches other users' files on a shared host:kill -0returns EPERM for a foreign live PID (misclassified as dead), and a failedrm -fon a sticky-/tmpfile was the last command of its||list —set -ekilled drydock at dispatch, before evenversionorhelp. The reaper now skips files not owned by the current user ([ -O ], which also removes the EPERM misclassification) and guards bothrmpaths.F2 — pure-bash selector cancels on EOF/Ctrl-C instead of selecting (
7b01225)In
_select_choice_pure, a failedread(EOF, interrupted) was swallowed by|| true, leavingkey=""— exactly the ENTER detection — so a cancel gesture SELECTED the highlighted option (incmd_run's menu, that attaches to a session). A failed read now yields 130, matching the gum/fzf tiers' contract. Verified empirically: SIGINT mid-read → rc 130, nothing selected.F3 — inherited
$PROJECT_NAMEis no longer trusted at entry points (4d0a798,92e4641)cmd_run/cmd_attach/cmd_list/cmd_stop/_live_sessions/_all_sessions/_resolve_session_nameused${PROJECT_NAME:-...}at points where drydock had not yet set the variable — the only possible value was the caller's environment (PROJECT_NAMEis a common CI/tooling export), flowing unsanitized into session-targeting EREs:PROJECT_NAME='.*'madedrydock stoptarget every project's sessions. Entry points now derive the project via_current_project_name(); internal sites that run afterexport_compose_env(banner, pre-flight) legitimately keep the exported variable. The same-class siteensure_imageinlib/compose.sh(which steers egress-image mode resolution) is fixed too. Behavior change: scripts that exportedPROJECT_NAMEto redirect drydock away from the cwd must pass explicit names/dirs instead.F4 —
cmd_listno longer leaks prefix-named projects (4d0a798)The one session enumerator without a discriminator-anchored filter: project
apilisteddrydock-api-docs-cd34rows. Now anchored on the name field (^drydock-<proj>-[0-9a-f]{4}(-shell)?\|).F5+F6 —
drydock syncbootstraps runtime dirs and survives rsync failure (1947e00)cmd_syncon a fresh host reacheddocker run -v "$CONTAINER_CLAUDE":/dst:rwwith the bind source missing — on native Linux dockerd auto-creates it root-owned, wedging every subsequent run untilsudo rm.cmd_syncnow callsensure_runtime_dirs(recursion-verified:ensure_runtime_dirs→cmd_setuponly when state is missing;cmd_setupnever calls back). Separately, a failed staging rsync under direct dispatch aborted before cleanup, leaking a full dereferenced copy of~/.claude(credentials included) in/tmp; the rc-capture idiom (|| _rsync_rc=$?) now guarantees cleanup and rc propagation on both dispatch paths.F7 — git-safety: four accident-class bypasses closed (
f436d1c,6f196e1)All verified live against the real hook before fixing: (1)
git -C <path> push --force(and the whole-Cfamily) matched no deny rule — the normal way agents operate on RW siblings silently lost force-push/no-verify protection; (2)git commit -n/-nm(short--no-verify) was completely unblocked (-non push is--dry-runand stays allowed); (3) force-push via+refspec(git push origin +main); (4) remote protected-branch deletion (git push origin --delete main,-dforms, with or without-C). Fixed in BOTH layers: deny-rule mirrors in10-git-safety.json(token-safe long forms only — a*-d*glob would false-positive--dry-run, so short forms live in the hook, following the documented C1-residue precedent) and new G1–G5 detection indrydock-block-destructive.shscanning per pipe-part regardless of-Cposition.--force-with-leaseremains blocked, consistent with the pre-existing glob policy. Named residual: a quoted refspec (git push origin "+main") falls through via the ADR-9 quoted-data masking — accident-floor accepted under threat model A. Known over-block noise (fails safe): unquotedecho git push --force, andgit -C main branch -d xwhere the path itself is named like a protected branch.Invariant impact
INV-3 strengthened (the hook layer now actually enforces what §5 claims about
--force/--no-verifyinside sessions); no mounts, caps, or compose topology touched.Verification
-mnotevalue-attachment, pipe-part isolation).block_destructive.bats240/240,managed_settings.bats75/75,cmd_marker.bats8/8,select_choice.bats25/25; full suite green at branch point (4cmd_markertests adapted — they relied on exportingPROJECT_NAMEwithout a matching cwd, the exact trust F3 removes).bash -n,shellcheck,shfmt -d,jq .,scripts/lint-commits.sh: clean.6f196e1, NIT-2 (over-block noise) documented above.