Skip to content

merge_guard_pre.py PreToolUse hook not invoked by Claude Code on Bash tool call (platform-level invocation gap) #799

@michael-wojcik

Description

@michael-wojcik

Summary

During PR #790's wrap-up forensics, a 19-hour-old expired/cross-session merge_guard authorization token was discovered to have survived undisturbed through an entire session containing dozens of Bash tool calls. Direct invocation of merge_guard_pre.py (the PreToolUse Bash matcher hook) against the same forensic state immediately cleaned up the stale token — proving the hook works correctly when invoked. The empirical conclusion: Claude Code did not invoke the merge_guard_pre.py PreToolUse hook on the gh pr merge Bash tool call during PR #790's merge, despite the hook being source-correct, registered in hooks.json, and proven-functional via direct testing.

This issue tracks the platform-level investigation. Token-lifecycle hardening (which would have caught the empirical bypass via a different code path) is filed separately in #797.

Empirical evidence chain

Check Result
Hook source correctness (md5) be5262082e9ff88be97a06a5eb3468a2 — identical between ~/.claude/plugins/cache/pact-plugin/PACT/4.2.9/hooks/merge_guard_pre.py and worktree source
Hook registration in v4.2.9 hooks.json PreToolUse matcher=Bash group at line 68-80; md5 19f97c623d981a69a0919a9775b066cd matches worktree
v4.2.8 (prior plugin version) also registered the hook ✅ — same registration shape — eliminates plugin-version-cache theories
Hook fires reliably on Bash calls after forensic instrumentation ✅ 5+ invocations logged across 5+ Bash calls during forensic
Hook correctly denies the EXACT compound-merge command shape used at 16:31:45 EDT ✅ Direct test: gh pr merge 99999 --repo ... --squash --subject "..." 2>&1 | tail -5 → "Compound destructive command rejected"
Token-validation logic operates correctly under freshness/session-scope/slot-claim checks ✅ Single manual invocation cleaned up the 19-hour-old stale token immediately
19-hour-old stale token persisted UNDISTURBED through dozens of pre-forensic Bash calls in session ✅ Direct proof find_valid_token was never invoked during the session pre-forensic — i.e., merge_guard_pre.py was never run

Why this is a Claude-Code-platform issue (not PACT-Plugin)

The hook implementation is correct. The hook registration is correct. The hook's functional verification reproduces correctly when invoked directly. The bypass is at the invocation layer: Claude Code did not call the hook at the moment the dangerous Bash tool was used.

This is not a PACT-Plugin code defect. It is either:

  • A Claude Code plugin-load lifecycle issue (the matcher-based PreToolUse hook group failed to register at session-start, even though the matcher-less PreToolUse hook bootstrap_gate.py clearly DID register)
  • A Claude Code internal bash-tool dispatch issue (some bash invocations bypass PreToolUse hooks for reasons internal to the harness)
  • Some other harness-side gap

Hypotheses (unfalsified — investigation needed)

H-A: Lazy/deferred matcher-based hook registration at session start.
Plugin v4.2.9 was extracted at May 18 14:06:38 EDT; session started 7 minutes later at 14:13:56 EDT. Claude Code may have completed bootstrap_gate.py registration (no-matcher PreToolUse) but not yet completed the matcher=Bash group registration when the session was instantiated. This would leave git_commit_check.py AND merge_guard_pre.py both un-invoked for some window after session start.

H-B: Plugin re-scan triggered later in session. The plugin directory mtime is May 18 16:41:04 EDT — approximately 10 minutes AFTER the bypass at 16:31:45. Something touched the plugin directory after the merge. This is consistent with the empirical observation that the hook began firing reliably ONLY after I modified the hook file at ~16:47 EDT. A plugin-scan event at 16:41 may have completed the matcher-based registration that wasn't completed at session start.

H-C: First-Bash-call registration race. A race condition during the first Bash tool call may leave the matcher=Bash hook group unregistered, persisting for the remainder of the session until a re-scan event corrects it.

H-D: Internal Claude Code bash-tool routing. Some Bash invocations may bypass PreToolUse hooks for reasons internal to Claude Code's tool dispatch. No supporting evidence; rejected as too speculative without further data.

Combined plausibility: H-A + H-B + H-C share a common shape — Claude Code's plugin-load lifecycle has a window where matcher-based hooks are not yet active, and a later scan event corrects it. The exact mechanism requires platform-team confirmation.

Out of scope for this issue

Proposed investigation steps (Layer 0)

  1. Add operator-visible diagnostic logging to merge_guard_pre.py (env-var-gated). When PACT_MERGE_GUARD_DEBUG=1 is set, the hook writes a line to ~/.claude/.merge_guard_invocation_log on every entry (timestamp, PID, tool_name, command-snippet, decision). This makes future hook-non-invocation events forensically visible without manual instrumentation.

  2. Reproduce in a controlled fresh session:

    • Extract plugin v4.2.9 (or current version)
    • Start Claude Code session WITHIN N seconds of plugin extract (varying N: 5s, 30s, 60s, 5 min)
    • Invoke a dangerous bash command (e.g., git push --force origin nonexistent-branch)
    • Observe whether merge_guard_pre fires (via debug log) and whether the harness blocks the command
    • Compare against a control session where plugin was extracted hours before session start
  3. Inspect Claude Code's plugin-load lifecycle:

    • Use claude-code-guide agent (per pinned discipline) to query Claude Code platform documentation for plugin-load semantics
    • Specifically: when are hooks loaded? Is loading deferred / lazy / event-triggered? Are matcher-based hooks loaded differently from no-matcher hooks?
  4. File upstream with Claude Code platform team if reproducer succeeds:

    • Hook-load-race condition would be a real platform bug worth surfacing
    • Defense-in-depth implications: PACT-Plugin's merge_guard is one of several plugin-installed PreToolUse hooks Claude Code users rely on for safety
  5. Document the platform-trust boundary:

    • PACT-Plugin's merge_guard system assumes Claude Code invokes registered hooks on every matching tool call. If that assumption can be violated by plugin-load timing, the threat model needs to be updated to call out the platform dependency explicitly.

Acceptance criteria

  • Diagnostic logging deployed in merge_guard_pre.py behind PACT_MERGE_GUARD_DEBUG=1 env var; documented in README
  • Reproducer attempt completed; result documented in this issue
  • claude-code-guide research completed; plugin-load semantics summarized in this issue
  • If platform bug confirmed → upstream issue filed with Claude Code team; cross-reference here
  • If platform bug NOT reproducible → document the residual uncertainty and the defense-in-depth posture (merge_guard: token survives PR merge — single-use-per-merge invalidation needed (+ forensics on stale-token bypass) #797's token-lifecycle hardening is still the load-bearing mitigation)
  • Threat-model documentation updated to call out plugin-load timing as a platform dependency

Empirical record (full reproduction state)

Plugin install (v4.2.9):  2026-05-18 14:06:38 EDT
Session start:            2026-05-18 14:13:56 EDT  (7 min post-install)
Bash calls pre-merge:     ~30+ (git status, git log, git add, git commit ×7,
                          git push ×3, gh pr view, gh pr create, gh issue create ×7, etc.)
PR #790 merge:            2026-05-18 16:31:45 EDT  (gh pr merge 790 --squash ...
                          2>&1 | tail -5 + multi-line follow-up commands — compound
                          + dangerous-pattern matches; should have been DENIED)
Plugin dir mtime:         2026-05-18 16:41:04 EDT  (~10 min POST-merge — something
                          touched the plugin dir)
Forensic instrumentation: ~2026-05-18 16:47 EDT
Hook fires reliably:      from forensic-instrumentation onward

Token survived undisturbed:

Cross-refs

Pattern note

This issue joins #791, #792, #793, #794, #795, #796, #797, #798 as instances of the meta-pattern surfaced during PR #790 work: load-bearing assumptions need both code-level correctness AND platform-level reliability. PACT-Plugin can deliver correct hook code, but invocation depends on Claude Code's platform behavior. When the platform-level assumption is violated, even correct hook code provides no safety. The discipline this PR's wrap-up surfaces: identify which load-bearing assumptions DEPEND on platform behavior, and document them as platform dependencies in the threat model rather than treating them as self-contained code invariants.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions