merge_guard_pre.py PreToolUse hook not invoked by Claude Code on Bash tool call (platform-level invocation gap)

## Summary

During PR #790's wrap-up forensics, a 19-hour-old expired/cross-session merge_guard authorization token was discovered to have survived undisturbed through an entire session containing dozens of Bash tool calls. Direct invocation of `merge_guard_pre.py` (the PreToolUse Bash matcher hook) against the same forensic state immediately cleaned up the stale token — proving the hook works correctly when invoked. The empirical conclusion: **Claude Code did not invoke the `merge_guard_pre.py` PreToolUse hook on the `gh pr merge` Bash tool call during PR #790's merge**, despite the hook being source-correct, registered in `hooks.json`, and proven-functional via direct testing.

This issue tracks the platform-level investigation. Token-lifecycle hardening (which would have caught the empirical bypass via a different code path) is filed separately in #797.

## Empirical evidence chain

| Check | Result |
|---|---|
| Hook source correctness (md5) | ✅ `be5262082e9ff88be97a06a5eb3468a2` — identical between `~/.claude/plugins/cache/pact-plugin/PACT/4.2.9/hooks/merge_guard_pre.py` and worktree source |
| Hook registration in v4.2.9 `hooks.json` | ✅ `PreToolUse` matcher=`Bash` group at line 68-80; md5 `19f97c623d981a69a0919a9775b066cd` matches worktree |
| v4.2.8 (prior plugin version) also registered the hook | ✅ — same registration shape — eliminates plugin-version-cache theories |
| Hook fires reliably on Bash calls after forensic instrumentation | ✅ 5+ invocations logged across 5+ Bash calls during forensic |
| Hook correctly denies the EXACT compound-merge command shape used at 16:31:45 EDT | ✅ Direct test: `gh pr merge 99999 --repo ... --squash --subject "..." 2>&1 \| tail -5` → "Compound destructive command rejected" |
| Token-validation logic operates correctly under freshness/session-scope/slot-claim checks | ✅ Single manual invocation cleaned up the 19-hour-old stale token immediately |
| 19-hour-old stale token persisted UNDISTURBED through dozens of pre-forensic Bash calls in session | ✅ Direct proof `find_valid_token` was never invoked during the session pre-forensic — i.e., `merge_guard_pre.py` was never run |

## Why this is a Claude-Code-platform issue (not PACT-Plugin)

The hook implementation is correct. The hook registration is correct. The hook's functional verification reproduces correctly when invoked directly. The bypass is at the **invocation layer**: Claude Code did not call the hook at the moment the dangerous Bash tool was used.

This is not a PACT-Plugin code defect. It is either:

- A Claude Code plugin-load lifecycle issue (the matcher-based PreToolUse hook group failed to register at session-start, even though the matcher-less PreToolUse hook `bootstrap_gate.py` clearly DID register)
- A Claude Code internal bash-tool dispatch issue (some bash invocations bypass PreToolUse hooks for reasons internal to the harness)
- Some other harness-side gap

## Hypotheses (unfalsified — investigation needed)

**H-A: Lazy/deferred matcher-based hook registration at session start.**
Plugin v4.2.9 was extracted at `May 18 14:06:38 EDT`; session started 7 minutes later at `14:13:56 EDT`. Claude Code may have completed `bootstrap_gate.py` registration (no-matcher PreToolUse) but not yet completed the matcher=`Bash` group registration when the session was instantiated. This would leave `git_commit_check.py` AND `merge_guard_pre.py` both un-invoked for some window after session start.

**H-B: Plugin re-scan triggered later in session.** The plugin directory mtime is `May 18 16:41:04 EDT` — approximately 10 minutes AFTER the bypass at 16:31:45. Something touched the plugin directory after the merge. This is consistent with the empirical observation that the hook began firing reliably ONLY after I modified the hook file at ~16:47 EDT. A plugin-scan event at 16:41 may have completed the matcher-based registration that wasn't completed at session start.

**H-C: First-Bash-call registration race.** A race condition during the first Bash tool call may leave the matcher=`Bash` hook group unregistered, persisting for the remainder of the session until a re-scan event corrects it.

**H-D: Internal Claude Code bash-tool routing.** Some Bash invocations may bypass PreToolUse hooks for reasons internal to Claude Code's tool dispatch. No supporting evidence; rejected as too speculative without further data.

**Combined plausibility**: H-A + H-B + H-C share a common shape — Claude Code's plugin-load lifecycle has a window where matcher-based hooks are not yet active, and a later scan event corrects it. The exact mechanism requires platform-team confirmation.

## Out of scope for this issue

- **Token-lifecycle hardening (Layers 1+3+4+5)** — filed in #797. Those layers provide defense-in-depth at the PACT-Plugin code surface even if the hook DOES fire correctly. They are necessary AND orthogonal to the platform-level invocation gap.
- **Reverting PR #790** — the merge content itself is good (4 reviewers APPROVE, 0 BLOCKING, all cycles verified); the bypass is procedural.

## Proposed investigation steps (Layer 0)

1. **Add operator-visible diagnostic logging to `merge_guard_pre.py` (env-var-gated)**. When `PACT_MERGE_GUARD_DEBUG=1` is set, the hook writes a line to `~/.claude/.merge_guard_invocation_log` on every entry (timestamp, PID, tool_name, command-snippet, decision). This makes future hook-non-invocation events forensically visible without manual instrumentation.

2. **Reproduce in a controlled fresh session**:
    - Extract plugin v4.2.9 (or current version)
    - Start Claude Code session WITHIN N seconds of plugin extract (varying N: 5s, 30s, 60s, 5 min)
    - Invoke a dangerous bash command (e.g., `git push --force origin nonexistent-branch`)
    - Observe whether merge_guard_pre fires (via debug log) and whether the harness blocks the command
    - Compare against a control session where plugin was extracted hours before session start

3. **Inspect Claude Code's plugin-load lifecycle**:
    - Use `claude-code-guide` agent (per pinned discipline) to query Claude Code platform documentation for plugin-load semantics
    - Specifically: when are hooks loaded? Is loading deferred / lazy / event-triggered? Are matcher-based hooks loaded differently from no-matcher hooks?

4. **File upstream with Claude Code platform team** if reproducer succeeds:
    - Hook-load-race condition would be a real platform bug worth surfacing
    - Defense-in-depth implications: PACT-Plugin's merge_guard is one of several plugin-installed PreToolUse hooks Claude Code users rely on for safety

5. **Document the platform-trust boundary**:
    - PACT-Plugin's merge_guard system assumes Claude Code invokes registered hooks on every matching tool call. If that assumption can be violated by plugin-load timing, the threat model needs to be updated to call out the platform dependency explicitly.

## Acceptance criteria

- [ ] Diagnostic logging deployed in merge_guard_pre.py behind `PACT_MERGE_GUARD_DEBUG=1` env var; documented in README
- [ ] Reproducer attempt completed; result documented in this issue
- [ ] claude-code-guide research completed; plugin-load semantics summarized in this issue
- [ ] If platform bug confirmed → upstream issue filed with Claude Code team; cross-reference here
- [ ] If platform bug NOT reproducible → document the residual uncertainty and the defense-in-depth posture (#797's token-lifecycle hardening is still the load-bearing mitigation)
- [ ] Threat-model documentation updated to call out plugin-load timing as a platform dependency

## Empirical record (full reproduction state)

```
Plugin install (v4.2.9):  2026-05-18 14:06:38 EDT
Session start:            2026-05-18 14:13:56 EDT  (7 min post-install)
Bash calls pre-merge:     ~30+ (git status, git log, git add, git commit ×7,
                          git push ×3, gh pr view, gh pr create, gh issue create ×7, etc.)
PR #790 merge:            2026-05-18 16:31:45 EDT  (gh pr merge 790 --squash ...
                          2>&1 | tail -5 + multi-line follow-up commands — compound
                          + dangerous-pattern matches; should have been DENIED)
Plugin dir mtime:         2026-05-18 16:41:04 EDT  (~10 min POST-merge — something
                          touched the plugin dir)
Forensic instrumentation: ~2026-05-18 16:47 EDT
Hook fires reliably:      from forensic-instrumentation onward
```

Token survived undisturbed:
- Created: `2026-05-17 21:39:54 EDT` (session ad575016, PR #783 merge)
- Discovered: `2026-05-18 16:43 EDT` (my forensic), 19 hours later
- Removed: First manual invocation of merge_guard_pre at ~16:43 (i.e., the hook works when called)

## Cross-refs

- **#797** — token-lifecycle hardening (Layers 1+3+4+5; PACT-Plugin code surface; defense-in-depth that protects against Layer-0 platform issue ALSO if hooks fire correctly)
- **PR #790 / v4.2.10** — empirical instance; merge succeeded despite hook should-have-blocked
- **`pact-plugin/hooks/merge_guard_pre.py`** — the hook that didn't fire
- **`pact-plugin/.claude-plugin/hooks.json` (worktree) / `~/.claude/plugins/cache/pact-plugin/PACT/4.2.9/hooks/hooks.json`** — correct registration; md5 match
- **Pinned discipline**: "Hooks cannot be smoke-tested against the running plugin in-session" (PR #638) — running plugin loads at session-start, file changes don't take effect until restart. This issue's investigation contradicts the simpler version of that discipline (my mid-session file modification DID take effect; the hook started firing). The discipline may need a clarification: file changes take effect when Claude Code's plugin-scan mechanism re-detects the file, which may be event-triggered rather than session-start-only.

## Pattern note

This issue joins #791, #792, #793, #794, #795, #796, #797, #798 as instances of the meta-pattern surfaced during PR #790 work: **load-bearing assumptions need both code-level correctness AND platform-level reliability**. PACT-Plugin can deliver correct hook code, but invocation depends on Claude Code's platform behavior. When the platform-level assumption is violated, even correct hook code provides no safety. The discipline this PR's wrap-up surfaces: identify which load-bearing assumptions DEPEND on platform behavior, and document them as platform dependencies in the threat model rather than treating them as self-contained code invariants.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merge_guard_pre.py PreToolUse hook not invoked by Claude Code on Bash tool call (platform-level invocation gap) #799

Summary

Empirical evidence chain

Why this is a Claude-Code-platform issue (not PACT-Plugin)

Hypotheses (unfalsified — investigation needed)

Out of scope for this issue

Proposed investigation steps (Layer 0)

Acceptance criteria

Empirical record (full reproduction state)

Cross-refs

Pattern note

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Check	Result
Hook source correctness (md5)	✅ `be5262082e9ff88be97a06a5eb3468a2` — identical between `~/.claude/plugins/cache/pact-plugin/PACT/4.2.9/hooks/merge_guard_pre.py` and worktree source
Hook registration in v4.2.9 `hooks.json`	✅ `PreToolUse` matcher=`Bash` group at line 68-80; md5 `19f97c623d981a69a0919a9775b066cd` matches worktree
v4.2.8 (prior plugin version) also registered the hook	✅ — same registration shape — eliminates plugin-version-cache theories
Hook fires reliably on Bash calls after forensic instrumentation	✅ 5+ invocations logged across 5+ Bash calls during forensic
Hook correctly denies the EXACT compound-merge command shape used at 16:31:45 EDT	✅ Direct test: `gh pr merge 99999 --repo ... --squash --subject "..." 2>&1 \| tail -5` → "Compound destructive command rejected"
Token-validation logic operates correctly under freshness/session-scope/slot-claim checks	✅ Single manual invocation cleaned up the 19-hour-old stale token immediately
19-hour-old stale token persisted UNDISTURBED through dozens of pre-forensic Bash calls in session	✅ Direct proof `find_valid_token` was never invoked during the session pre-forensic — i.e., `merge_guard_pre.py` was never run

merge_guard_pre.py PreToolUse hook not invoked by Claude Code on Bash tool call (platform-level invocation gap) #799

Description

Summary

Empirical evidence chain

Why this is a Claude-Code-platform issue (not PACT-Plugin)

Hypotheses (unfalsified — investigation needed)

Out of scope for this issue

Proposed investigation steps (Layer 0)

Acceptance criteria

Empirical record (full reproduction state)

Cross-refs

Pattern note

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions