Skip to content

fix(security): disk-persistent prompt-injection hardening (inject_claude_md content-sanitization + path guards)#47

Merged
WRG-11 merged 1 commit into
mainfrom
fix/r89-132b-injection-hardening
May 31, 2026
Merged

fix(security): disk-persistent prompt-injection hardening (inject_claude_md content-sanitization + path guards)#47
WRG-11 merged 1 commit into
mainfrom
fix/r89-132b-injection-hardening

Conversation

@WRG-11
Copy link
Copy Markdown
Owner

@WRG-11 WRG-11 commented May 31, 2026

Summary — R89-132b (security)

inject_claude_md embedded promoted-rule fields (pattern/category/explain) into a CLAUDE.md file with no neutralization. An attacker who promotes a rule via observe(explain="legit\n- malicious instruction") × 10 could inject arbitrary Markdown instruction lines into ~/.claude/CLAUDE.md — prompt injection via disk that later Claude sessions load.

A path guard alone is not sufficient: the headline target (~/.claude/CLAUDE.md) is already a legitimate .md destination, so blocking the path would break the tool. The real defense is content-sanitization at the sink.

3-layer fix

Layer Finding Change
L1 (primary) INSTINCT-M-001 (MED) _sanitize_inline() neutralizes every field embedded into a CLAUDE.md bullet — collapses CR/LF/control chars → space (no line break-out), backticks → ' (no code-span escape), and breaks the block's own <!--/--> fences (no smuggled instinct:end marker). Applied to pattern, category, explain. Defends even rows poisoned by another write path / pre-fix.
L2 (def-in-depth) INSTINCT-M-001 / L-001 (LOW) .md/.mdx suffix + symlink guard on inject_claude_md; .md/.mdx suffix guard on import_claude_md checked before the existence check, closing the "file not found: <path>" existence oracle.
L3 INSTINCT-L-002 (LOW) _validate_category coerces unknown categories → "other" and observe() now uses the return value. The old code discarded it, so the coercion was a no-op — fixed here.

Adapted from the H R89-130h reference patches, not blind-applied: H's M-001 suffix/symlink guard does not close the headline attack (a .md target passes it), so the primary control here is sink-side content-sanitization. The L-002 patch as written was a no-op until observe() was wired to use the validated category.

Tests (TDD, RED→GREEN proven)

tests/test_injection_hardening_r89_132b.py11 new tests:

  • 8 negatives: explain/pattern newline injection, poisoned-DB-row sink defense, end-marker smuggling, category coerce-on-observe, non-.md suffix reject (inject + import), symlink reject.
  • 3 "legit still works": valid category preserved, normal .md inject, normal .md import.

RED proven: with the store.py fix stashed, exactly the 8 security tests fail; the 3 legit-use tests pass in both states (non-vacuous). GREEN after. Full suite: 145 passed. Legitimate ~/.claude/CLAUDE.md use is preserved.

Notes

  • One pre-existing ruff I001 (cosmetic whitespace before a # type: ignore on the tomli fallback line in load_config) is left untouched — it predates this change and is unrelated to the security fix. All added code is ruff-clean.
  • Do not merge — pending security review + operator merge.

…(M-001 + L-001/L-002)

inject_claude_md embedded promoted-rule fields (pattern/category/explain) into
CLAUDE.md with no neutralization. An attacker who promotes a rule via
observe(explain="x\n- malicious instruction") x THRESHOLD_RULE could inject
arbitrary Markdown instruction lines into ~/.claude/CLAUDE.md (prompt injection
via disk; later sessions load it). A path guard alone is insufficient — the
headline target is already a legitimate .md file, so the real defense is at the
content sink.

3-layer fix:
- L1 (primary): _sanitize_inline() neutralizes EVERY embedded field at the
  inject_claude_md sink — collapses CR/LF/controls to a space (no line
  break-out), neutralizes backticks (no code-span escape) and the block's own
  HTML-comment fences (no smuggled instinct:end marker). Defends even rows
  poisoned by another write path / pre-fix.
- L2 (defense-in-depth): .md/.mdx suffix + symlink guard on inject_claude_md;
  .md/.mdx suffix guard on import_claude_md, checked BEFORE the existence
  check so the "file not found" existence oracle is closed (INSTINCT-L-001).
- L3: _validate_category coerces unknown categories to "other" AND observe()
  now USES the return value — the old code discarded it, so the coercion was a
  no-op (INSTINCT-L-002).

Tests: 11 new (8 injection/guard negatives + 3 legit-still-works). RED proven
(exactly the 8 security tests fail pre-fix), GREEN after. Full suite 145 passed.
Legitimate ~/.claude/CLAUDE.md use preserved.
@WRG-11 WRG-11 merged commit fd2b000 into main May 31, 2026
12 checks passed
@WRG-11 WRG-11 deleted the fix/r89-132b-injection-hardening branch May 31, 2026 20:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant