Skip to content

fix(cli): audit batch 2 — reaper abort, selector cancel, PROJECT_NAME trust, sync wedge, git-safety bypasses#163

Merged
jraicr merged 7 commits into
devfrom
fix/audit-batch2-cli-robustness
Jun 11, 2026
Merged

fix(cli): audit batch 2 — reaper abort, selector cancel, PROJECT_NAME trust, sync wedge, git-safety bypasses#163
jraicr merged 7 commits into
devfrom
fix/audit-batch2-cli-robustness

Conversation

@jraicr

@jraicr jraicr commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Summary

Batch 2 (CLI robustness) of the full-codebase audit of the v0.3.3 feature set: seven fixes. ~250 lines of product code; the rest is TDD test coverage. size:exception per the convention agreed for audit batches (batch 1: #162).

F1 — startup orphan-overlay reaper no longer aborts every command (85a11b4)

The reap loop in bin/drydock globs /tmp/drydock-submounts-*.yml, which matches other users' files on a shared host: kill -0 returns EPERM for a foreign live PID (misclassified as dead), and a failed rm -f on a sticky-/tmp file was the last command of its || list — set -e killed drydock at dispatch, before even version or help. The reaper now skips files not owned by the current user ([ -O ], which also removes the EPERM misclassification) and guards both rm paths.

F2 — pure-bash selector cancels on EOF/Ctrl-C instead of selecting (7b01225)

In _select_choice_pure, a failed read (EOF, interrupted) was swallowed by || true, leaving key="" — exactly the ENTER detection — so a cancel gesture SELECTED the highlighted option (in cmd_run's menu, that attaches to a session). A failed read now yields 130, matching the gum/fzf tiers' contract. Verified empirically: SIGINT mid-read → rc 130, nothing selected.

F3 — inherited $PROJECT_NAME is no longer trusted at entry points (4d0a798, 92e4641)

cmd_run/cmd_attach/cmd_list/cmd_stop/_live_sessions/_all_sessions/_resolve_session_name used ${PROJECT_NAME:-...} at points where drydock had not yet set the variable — the only possible value was the caller's environment (PROJECT_NAME is a common CI/tooling export), flowing unsanitized into session-targeting EREs: PROJECT_NAME='.*' made drydock stop target every project's sessions. Entry points now derive the project via _current_project_name(); internal sites that run after export_compose_env (banner, pre-flight) legitimately keep the exported variable. The same-class site ensure_image in lib/compose.sh (which steers egress-image mode resolution) is fixed too. Behavior change: scripts that exported PROJECT_NAME to redirect drydock away from the cwd must pass explicit names/dirs instead.

F4 — cmd_list no longer leaks prefix-named projects (4d0a798)

The one session enumerator without a discriminator-anchored filter: project api listed drydock-api-docs-cd34 rows. Now anchored on the name field (^drydock-<proj>-[0-9a-f]{4}(-shell)?\|).

F5+F6 — drydock sync bootstraps runtime dirs and survives rsync failure (1947e00)

cmd_sync on a fresh host reached docker run -v "$CONTAINER_CLAUDE":/dst:rw with the bind source missing — on native Linux dockerd auto-creates it root-owned, wedging every subsequent run until sudo rm. cmd_sync now calls ensure_runtime_dirs (recursion-verified: ensure_runtime_dirscmd_setup only when state is missing; cmd_setup never calls back). Separately, a failed staging rsync under direct dispatch aborted before cleanup, leaking a full dereferenced copy of ~/.claude (credentials included) in /tmp; the rc-capture idiom (|| _rsync_rc=$?) now guarantees cleanup and rc propagation on both dispatch paths.

F7 — git-safety: four accident-class bypasses closed (f436d1c, 6f196e1)

All verified live against the real hook before fixing: (1) git -C <path> push --force (and the whole -C family) matched no deny rule — the normal way agents operate on RW siblings silently lost force-push/no-verify protection; (2) git commit -n / -nm (short --no-verify) was completely unblocked (-n on push is --dry-run and stays allowed); (3) force-push via +refspec (git push origin +main); (4) remote protected-branch deletion (git push origin --delete main, -d forms, with or without -C). Fixed in BOTH layers: deny-rule mirrors in 10-git-safety.json (token-safe long forms only — a *-d* glob would false-positive --dry-run, so short forms live in the hook, following the documented C1-residue precedent) and new G1–G5 detection in drydock-block-destructive.sh scanning per pipe-part regardless of -C position. --force-with-lease remains blocked, consistent with the pre-existing glob policy. Named residual: a quoted refspec (git push origin "+main") falls through via the ADR-9 quoted-data masking — accident-floor accepted under threat model A. Known over-block noise (fails safe): unquoted echo git push --force, and git -C main branch -d x where the path itself is named like a protected branch.

Invariant impact

INV-3 strengthened (the hook layer now actually enforces what §5 claims about --force/--no-verify inside sessions); no mounts, caps, or compose topology touched.

Verification

  • Live hook battery by the fresh reviewer: 11 block-expected payloads rc 2, 14 allow-expected rc 0 (including dry-run, quoted-data, -mnote value-attachment, pipe-part isolation).
  • block_destructive.bats 240/240, managed_settings.bats 75/75, cmd_marker.bats 8/8, select_choice.bats 25/25; full suite green at branch point (4 cmd_marker tests adapted — they relied on exporting PROJECT_NAME without a matching cwd, the exact trust F3 removes).
  • bash -n, shellcheck, shfmt -d, jq ., scripts/lint-commits.sh: clean.
  • Fresh-context adversarial review: READY-WITH-NITS; NIT-1 (remote deletion gap) closed in 6f196e1, NIT-2 (over-block noise) documented above.

jraicr added 7 commits June 11, 2026 08:35
…reaper

On a multi-user /tmp the reap glob matches other users' overlay files.
kill -0 on a foreign live PID returns EPERM and is misread as dead, and
rm -f on a sticky-/tmp file the current user does not own exits 1 as the
last command of the || list — set -e then aborts dispatch before any
command runs, including 'drydock version'.

Skip files not owned by the current user (also fixes the EPERM
misclassification) and guard both rm paths with '2>/dev/null || true' so
an unremovable stale overlay is skipped, never fatal.
_select_choice_pure swallowed read failures (IFS= read -rsn1 || true),
leaving key empty — indistinguishable from ENTER — so EOF'd stdin or a
Ctrl-C-interrupted read SELECTED the highlighted option instead of
cancelling. In cmd_run's menu the first option is Attach, so a cancel
gesture attached to a session.

A failed read now breaks the input loop with result 130, matching the
cancel contract of the gum and fzf tiers.
PROJECT_NAME is a common CI/tooling export. cmd_run, cmd_attach,
cmd_list, cmd_stop, the session helpers (_live_sessions, _all_sessions,
_resolve_session_name), and ensure_image all read it via a
${PROJECT_NAME:-...} fallback at points where drydock had not set it,
so the only possible pre-existing value was the caller's environment.
The value flowed unsanitized into docker --filter and grep -E session
regexes: PROJECT_NAME='.*' made 'drydock stop' target every project's
sessions, and a foreign name steered ensure_image's mode resolution.

Entry points now derive the project unconditionally (sanitized basename
of the resolved project dir / cwd). The internal flow after
export_compose_env — which legitimately sets and exports PROJECT_NAME —
is unchanged.

Also anchors cmd_list's post-filter on the 4-hex discriminator so
project 'api' no longer lists 'api-docs' sessions ( -shell companion
rows are intentionally kept).
Two cmd_sync robustness gaps:

1. Sync before first setup: cmd_sync called only ensure_prereqs +
   ensure_image, then docker run with -v "$CONTAINER_CLAUDE":/dst:rw.
   On native Linux the daemon auto-creates a missing bind source as a
   ROOT-OWNED directory, wedging every later drydock command on EACCES
   until a manual sudo rm. cmd_sync now runs ensure_runtime_dirs first,
   matching cmd_run / cmd_shell and cmd_setup's header comment; a
   fresh-host sync bootstraps user-owned state via the idempotent
   cmd_setup (which never calls back into cmd_sync — no recursion).

2. Staging leak on failure: the docker/rsync staging invocation was a
   plain command, so under direct dispatch set -e aborted BEFORE
   rm -rf "$_staging", leaking a complete dereferenced ~/.claude copy
   (credentials included) under /tmp/drydock-sync.XXXXXX. The
   'local _rsync_rc=$?' capture only worked on the errexit-suppressed
   auto-sync path. Now uses the reach-exit-guarded '|| _rsync_rc=$?'
   idiom (same as _run_claude_lifecycle) so cleanup and rc propagation
   run on both dispatch paths.
Three verified accident-class gaps, fixed at BOTH guardrail layers:

(a) Every 10-git-safety.json rule anchored on a literal 'git push' /
    'git commit' / 'git branch' prefix, so the global-flag form
    (git -C <path> push --force — the NORMAL way agents touch RW
    siblings) matched nothing, and the hook had zero git logic.
(b) git commit -n / -nm "msg" (short --no-verify) was unblocked.
    push -n means --dry-run and stays allowed.
(c) Force-push via +refspec (git push origin +main) was unblocked.

Deny layer: adds 'git -C *' mirrors of the existing destructive
patterns, plus 'git commit -n*' and 'git push * +*' entries.

Hook layer (robust catch-all): new G1-G4 rules scan per pipe-part —
the A2 precedent — for force-push (keeping the deny glob's existing
--force-with-lease stance), --no-verify and commit short -n (prefix
class excludes value-taking flags so 'git commit -mnote' cannot
false-positive), +refspec, and protected-branch deletion, regardless
of global-flag position. Quoted data stays masked by the ADR-9
quote-drop, so a commit message containing 'git push --force' is not
blocked.
cmd_attach now derives the project from the cwd instead of trusting an
inherited PROJECT_NAME (audit batch 2), so the marker tests must run
from a directory matching the stubbed project — the same harness
pattern cmd_attach.bats and cmd_stop.bats already use.
@jraicr jraicr added type:fix Bug fix size:exception Over 400 lines, pre-approved by maintainer labels Jun 11, 2026
@jraicr jraicr merged commit b565398 into dev Jun 11, 2026
4 checks passed
@jraicr jraicr deleted the fix/audit-batch2-cli-robustness branch June 11, 2026 09:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:exception Over 400 lines, pre-approved by maintainer type:fix Bug fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant