fix(security): disk-persistent prompt-injection hardening (inject_claude_md content-sanitization + path guards)#47
Merged
Conversation
…(M-001 + L-001/L-002) inject_claude_md embedded promoted-rule fields (pattern/category/explain) into CLAUDE.md with no neutralization. An attacker who promotes a rule via observe(explain="x\n- malicious instruction") x THRESHOLD_RULE could inject arbitrary Markdown instruction lines into ~/.claude/CLAUDE.md (prompt injection via disk; later sessions load it). A path guard alone is insufficient — the headline target is already a legitimate .md file, so the real defense is at the content sink. 3-layer fix: - L1 (primary): _sanitize_inline() neutralizes EVERY embedded field at the inject_claude_md sink — collapses CR/LF/controls to a space (no line break-out), neutralizes backticks (no code-span escape) and the block's own HTML-comment fences (no smuggled instinct:end marker). Defends even rows poisoned by another write path / pre-fix. - L2 (defense-in-depth): .md/.mdx suffix + symlink guard on inject_claude_md; .md/.mdx suffix guard on import_claude_md, checked BEFORE the existence check so the "file not found" existence oracle is closed (INSTINCT-L-001). - L3: _validate_category coerces unknown categories to "other" AND observe() now USES the return value — the old code discarded it, so the coercion was a no-op (INSTINCT-L-002). Tests: 11 new (8 injection/guard negatives + 3 legit-still-works). RED proven (exactly the 8 security tests fail pre-fix), GREEN after. Full suite 145 passed. Legitimate ~/.claude/CLAUDE.md use preserved.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary — R89-132b (security)
inject_claude_mdembedded promoted-rule fields (pattern/category/explain) into a CLAUDE.md file with no neutralization. An attacker who promotes a rule viaobserve(explain="legit\n- malicious instruction") × 10could inject arbitrary Markdown instruction lines into~/.claude/CLAUDE.md— prompt injection via disk that later Claude sessions load.A path guard alone is not sufficient: the headline target (
~/.claude/CLAUDE.md) is already a legitimate.mddestination, so blocking the path would break the tool. The real defense is content-sanitization at the sink.3-layer fix
_sanitize_inline()neutralizes every field embedded into a CLAUDE.md bullet — collapses CR/LF/control chars → space (no line break-out), backticks →'(no code-span escape), and breaks the block's own<!--/-->fences (no smuggledinstinct:endmarker). Applied topattern,category,explain. Defends even rows poisoned by another write path / pre-fix..md/.mdxsuffix + symlink guard oninject_claude_md;.md/.mdxsuffix guard onimport_claude_mdchecked before the existence check, closing the "file not found: <path>" existence oracle._validate_categorycoerces unknown categories →"other"andobserve()now uses the return value. The old code discarded it, so the coercion was a no-op — fixed here.Tests (TDD, RED→GREEN proven)
tests/test_injection_hardening_r89_132b.py— 11 new tests:.mdsuffix reject (inject + import), symlink reject..mdinject, normal.mdimport.RED proven: with the
store.pyfix stashed, exactly the 8 security tests fail; the 3 legit-use tests pass in both states (non-vacuous). GREEN after. Full suite: 145 passed. Legitimate~/.claude/CLAUDE.mduse is preserved.Notes
ruff I001(cosmetic whitespace before a# type: ignoreon thetomlifallback line inload_config) is left untouched — it predates this change and is unrelated to the security fix. All added code is ruff-clean.