Detect malicious AI agent skills before they compromise your system.
A cross-platform security audit skill that scans third-party AI agent skills, plugins, and tool definitions for security vulnerabilities. Uses AI semantic analysis with 61 detection patterns across 9 risk categories aligned with the OWASP Agentic AI Top 10. Zero dependencies -- works on any platform that supports AI agent skills.
The AI agent skill ecosystem has a security problem. Third-party skills execute with your agent's full privileges, and marketplaces have limited vetting:
- Snyk ToxicSkills -- 36.82% of MCP servers on ClawHub have at least one vulnerability
- SlowMist -- 472+ malicious MCP servers identified with real-world credential theft and data exfiltration
- Koi Security ClawHavoc -- 341 malicious servers found in 2,857 scanned
Skills can read your SSH keys, exfiltrate environment variables, install persistent backdoors, and inject prompts that override your agent's safety guardrails. A single malicious skill compromises everything.
61 detection patterns across 9 categories:
| ID | Category | Severity | OWASP ASI | Patterns |
|---|---|---|---|---|
| PI | Prompt Injection | CRITICAL | ASI01 | PI-001 to PI-008 |
| DE | Data Exfiltration | CRITICAL | ASI02 | DE-001 to DE-009 |
| CE | Malicious Command Execution | CRITICAL | ASI02, ASI05 | CE-001 to CE-010 |
| OB | Obfuscated/Hidden Code | WARNING | -- | OB-001 to OB-007 |
| PA | Privilege Over-Request | WARNING | ASI03 | PA-001 to PA-005 |
| SC | Supply Chain Risks | WARNING | ASI04 | SC-001 to SC-006 |
| MP | Memory/Context Poisoning | WARNING | ASI06 | MP-001 to MP-005 |
| TE | Human Trust Exploitation | WARNING | ASI09 | TE-001 to TE-006 |
| BM | Behavioral Manipulation | INFO | ASI10 | BM-001 to BM-005 |
Every pattern includes malicious examples, explanations of the danger, and false-positive guidance. See skills/skills-security-audit/references/security-rules.md for the full ruleset.
# Add the marketplace and install
/plugin marketplace add https://github.com/agentnode-dev/skills-security-audit.git
/plugin install skills-security-audit@agentnode-devThen restart Claude Code. The skill will appear in your available skills and trigger automatically when relevant.
Clone the repo and point your AI agent to the directory:
git clone https://github.com/agentnode-dev/skills-security-audit.gitThen tell your agent:
Load the skill at /path/to/skills-security-audit/skills/skills-security-audit/SKILL.md and audit the skill at /path/to/suspicious-skill/
Copy the contents of skills/skills-security-audit/SKILL.md into your AI agent's system prompt or conversation, then ask it to audit a skill. This works with any AI agent — no installation needed.
Once loaded, ask your agent:
Audit the skill at /path/to/suspicious-skill/
Or scan all installed skills:
Scan all my installed skills for security issues
This is a pure Skill -- a markdown file that instructs any AI agent how to perform security audits. No code to install, no runtime to configure.
The skill identifies target files (.md, .json, .js, .py, .sh, .ts, .yaml, .yml) from the path you specify, a GitHub URL, or your platform's installed skills directory.
The AI agent reads each file and checks its content against all 61 detection patterns using semantic analysis. Unlike regex-based tools, this catches obfuscated, paraphrased, and novel attack patterns because the AI understands intent, not just string matches.
A structured report with severity ratings, evidence citations (file path and line number), and actionable remediation for each finding.
Each finding contributes to the risk score:
| Finding Severity | Points |
|---|---|
| CRITICAL | +2.0 |
| WARNING | +0.8 |
| INFO | +0.2 |
Risk levels (max 10.0):
| Score | Level | Action |
|---|---|---|
| 0.0 -- 2.0 | SAFE | No significant risks found |
| 2.1 -- 5.0 | RISKY | Manual review recommended before use |
| 5.1 -- 8.0 | DANGEROUS | Do not install |
| 8.1 -- 10.0 | MALICIOUS | Confirmed malicious intent -- report to marketplace |
This skill is pure markdown — any AI agent that can read files or accept pasted instructions can use it.
Native skill loading (agent reads SKILL.md directly):
- Claude Code (via plugin install or local file)
- Cursor (via
.cursorrulesor project context) - Windsurf (via project context)
- OpenClaw / ClawHub
Copy-paste (paste SKILL.md content into conversation):
- ChatGPT, Gemini, OpenAI Agents, or any LLM chat interface
This skill's detection patterns are informed by real-world threat intelligence:
- SlowMist -- Analysis of 472+ malicious MCP servers on ClawHub, including two-stage payload loading, file harvesting, and platform relay techniques
- Snyk ToxicSkills -- Research finding 36.82% vulnerability rate across ClawHub MCP servers
- Koi Security ClawHavoc -- Discovery of 341 malicious servers in 2,857 scanned
- OWASP Agentic AI Top 10 -- Framework for categorizing agentic AI security risks (ASI01 through ASI10)
Contributions are welcome. To add a new detection pattern:
- Add the rule to
skills/skills-security-audit/references/security-rules.mdfollowing the existing format (pattern, malicious example, danger explanation, false-positive guidance) - Update the summary table in both
security-rules.mdandSKILL.md(both underskills/skills-security-audit/) - Submit a pull request with a description of the threat the new pattern addresses
MIT