fix: harden prompt injection defenses (M-08, M-09) by riaworks · Pull Request #13 · thiagofinch/mega-brain

riaworks · 2026-03-01T23:11:20Z

Summary

Hardens prompt injection defenses in mega-brain's hook system. Part of the Security Remediation Plan (PR 4 of 7).

M-08: Personality File Integrity Verification (session_start.py)

SHA-256 hash verification for personality files injected into LLM context
Creates baseline integrity manifest on first run (.claude/jarvis/INTEGRITY-MANIFEST.json)
On subsequent runs, compares current file hashes against stored baseline
Warns but does NOT block on hash mismatch (graceful degradation)
Does NOT auto-update manifest when changes detected (preserves security purpose)
Files monitored: JARVIS-DNA-PERSONALITY.md, JARVIS-SOUL.md, JARVIS-BOOT-SEQUENCE.md, JARVIS-MEMORY.md

M-09: Skill/Sub-Agent Whitelist for Auto-Injection (skill_router.py)

Path traversal prevention via os.path.normpath() + allowed prefix validation
Explicit whitelist (.claude/SKILL-WHITELIST.json) controls which skills can be auto-injected
Blocked skills/sub-agents logged to logs/skill-security.jsonl
Graceful degradation: if no whitelist file exists, all skills in valid paths are trusted (backward compatible)
Whitelist supports: trusted_skills, trusted_subagents, blocked lists, and wildcard (*)

Files Changed

File	Change
`.claude/hooks/session_start.py`	+139 lines: integrity verification functions + main() integration
`.claude/hooks/skill_router.py`	+92 lines: whitelist/path security functions + main() integration
`.claude/SKILL-WHITELIST.json`	NEW: whitelist with all 40 current trusted skills

Security Properties

Warn-only: Both defenses warn but don't block functionality
No new dependencies: Uses only Python stdlib (hashlib, json, os, pathlib)
No exec/eval/os.system: Zero dynamic code execution
Backward compatible: Existing installations work without whitelist file

OWASP/MITRE Mapping

Finding	OWASP LLM	MITRE ATLAS	CVSS
M-08	LLM02 (Insecure Output Handling)	AML.T0051 (Prompt Injection)	5.3
M-09	LLM02 (Insecure Output Handling)	AML.T0051 (Prompt Injection)	5.3

🤖 Generated with Claude Code

All previous history squashed for security hygiene. Repository fully sanitized - no residual sensitive data.

M-08: Add SHA-256 integrity verification for personality files injected into LLM context via session_start.py. Creates baseline manifest on first run and warns on hash mismatch (no auto-update). M-09: Add whitelist-based skill/sub-agent injection control in skill_router.py. Prevents unauthorized SKILL.md files from being auto-injected via path traversal prevention (normpath + prefix check) and explicit trusted skills whitelist. Logs blocked attempts. Security: warn-only mode (graceful degradation, no blocking). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

aquilatrindade and others added 6 commits February 27, 2026 19:25

chore: clean baseline - single commit repository

8bdfb71

All previous history squashed for security hygiene. Repository fully sanitized - no residual sensitive data.

Update README.md

06e182b

Update README.md

d8ad9a5

Update README.md

122195e

Update README.md

335d34a

riaworks requested a review from thiagofinch as a code owner March 1, 2026 23:11

thiagofinch closed this Mar 12, 2026

thiagofinch force-pushed the main branch from c922303 to 7de112b Compare March 12, 2026 17:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: harden prompt injection defenses (M-08, M-09)#13

fix: harden prompt injection defenses (M-08, M-09)#13
riaworks wants to merge 6 commits intothiagofinch:mainfrom
riaworks:fix/prompt-injection-defenses

riaworks commented Mar 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

riaworks commented Mar 1, 2026

Summary

M-08: Personality File Integrity Verification (session_start.py)

M-09: Skill/Sub-Agent Whitelist for Auto-Injection (skill_router.py)

Files Changed

Security Properties

OWASP/MITRE Mapping

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants