Skip to content

fix: harden prompt injection defenses (M-08, M-09)#13

Closed
riaworks wants to merge 6 commits intothiagofinch:mainfrom
riaworks:fix/prompt-injection-defenses
Closed

fix: harden prompt injection defenses (M-08, M-09)#13
riaworks wants to merge 6 commits intothiagofinch:mainfrom
riaworks:fix/prompt-injection-defenses

Conversation

@riaworks
Copy link

@riaworks riaworks commented Mar 1, 2026

Summary

Hardens prompt injection defenses in mega-brain's hook system. Part of the Security Remediation Plan (PR 4 of 7).

M-08: Personality File Integrity Verification (session_start.py)

  • SHA-256 hash verification for personality files injected into LLM context
  • Creates baseline integrity manifest on first run (.claude/jarvis/INTEGRITY-MANIFEST.json)
  • On subsequent runs, compares current file hashes against stored baseline
  • Warns but does NOT block on hash mismatch (graceful degradation)
  • Does NOT auto-update manifest when changes detected (preserves security purpose)
  • Files monitored: JARVIS-DNA-PERSONALITY.md, JARVIS-SOUL.md, JARVIS-BOOT-SEQUENCE.md, JARVIS-MEMORY.md

M-09: Skill/Sub-Agent Whitelist for Auto-Injection (skill_router.py)

  • Path traversal prevention via os.path.normpath() + allowed prefix validation
  • Explicit whitelist (.claude/SKILL-WHITELIST.json) controls which skills can be auto-injected
  • Blocked skills/sub-agents logged to logs/skill-security.jsonl
  • Graceful degradation: if no whitelist file exists, all skills in valid paths are trusted (backward compatible)
  • Whitelist supports: trusted_skills, trusted_subagents, blocked lists, and wildcard (*)

Files Changed

File Change
.claude/hooks/session_start.py +139 lines: integrity verification functions + main() integration
.claude/hooks/skill_router.py +92 lines: whitelist/path security functions + main() integration
.claude/SKILL-WHITELIST.json NEW: whitelist with all 40 current trusted skills

Security Properties

  • Warn-only: Both defenses warn but don't block functionality
  • No new dependencies: Uses only Python stdlib (hashlib, json, os, pathlib)
  • No exec/eval/os.system: Zero dynamic code execution
  • Backward compatible: Existing installations work without whitelist file

OWASP/MITRE Mapping

Finding OWASP LLM MITRE ATLAS CVSS
M-08 LLM02 (Insecure Output Handling) AML.T0051 (Prompt Injection) 5.3
M-09 LLM02 (Insecure Output Handling) AML.T0051 (Prompt Injection) 5.3

🤖 Generated with Claude Code

aquilatrindade and others added 6 commits February 27, 2026 19:25
All previous history squashed for security hygiene.
Repository fully sanitized - no residual sensitive data.
M-08: Add SHA-256 integrity verification for personality files
injected into LLM context via session_start.py. Creates baseline
manifest on first run and warns on hash mismatch (no auto-update).

M-09: Add whitelist-based skill/sub-agent injection control in
skill_router.py. Prevents unauthorized SKILL.md files from being
auto-injected via path traversal prevention (normpath + prefix check)
and explicit trusted skills whitelist. Logs blocked attempts.

Security: warn-only mode (graceful degradation, no blocking).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants