Skill Being Reviewed
Skill name: agent-security
Skill path: skills/ai-security/agent-security/
False Positive Analysis
Benign code that triggers a false positive:
agent_audit:
store_raw_prompts: false
store_chain_of_thought: false
store_prompt_hash: true
store_policy_trace: true
redact_tool_parameters:
- access_token
- customer_email
- secret_value
immutable_sink: cloud-logging-worm
correlation_id_required: true
Why this is a false positive:
The skill asks whether the agent's reasoning and prompt/context are logged. That can over-flag privacy-preserving designs that intentionally avoid raw prompt or chain-of-thought retention. For regulated systems, storing hashes, policy decisions, redacted tool parameters, tool results, user/agent IDs, and immutable correlation IDs can be stronger and safer than storing sensitive prompts verbatim.
Coverage Gaps
Missed variant 1:
const proposed = await agent.planToolCall(userRequest);
const summary = `${proposed.tool}: ${proposed.naturalLanguageSummary}`;
const approved = await approvals.request({ summary });
if (approved) {
const finalArgs = await agent.rewriteArgumentsAfterApproval();
await tools[proposed.tool].run(finalArgs);
}
Why it should be caught:
The skill reviews HITL gate placement and context quality, but does not require the approval decision to be cryptographically or canonically bound to the exact tool name, arguments, resource IDs, risk tier, and nonce that will execute. This creates a TOCTOU approval bypass: a human approves a harmless summary, then the agent changes parameters before execution.
Missed variant 2:
{
"mcpServers": {
"crm": {
"command": "npx",
"args": ["@vendor/crm-mcp-server@latest"],
"env": { "CRM_TOKEN": "${CRM_TOKEN}" }
}
}
}
Why it should be caught:
The skill covers tool breadth and dynamic tool sets, but should explicitly check tool-provider provenance for MCP/plugin servers: pinned package versions, signed manifests, allowed server identities, per-server secret scopes, and review of tool schemas. An unpinned tool server with a broad token can become the agent's effective permission boundary.
Edge Cases
Approval summaries are often generated by the same model being constrained. If the approval UI displays only a model-written summary, the human may approve a different action than the one executed. The safer design is a deterministic approval artifact: canonical JSON of tool, parameters, resource identifiers, risk tier, expiry, nonce, and policy decision hash. Execution should fail unless the exact artifact was approved.
Another edge case is audit minimization. "No raw prompt logs" should not automatically be a finding when the design has immutable hashes, redacted parameters, policy traces, and replayable event IDs.
Remediation Quality
Suggested detection additions:
Grep: "approve|approval|human_in_the_loop|require_approval|risk_tier|nonce|canonical" in **/*.{ts,js,py}
Grep: "mcpServers|ModelContextProtocol|FunctionTool|register_tool|tool_schema" in **/*.{json,ts,js,py,yaml,yml}
Grep: "npx .*@latest|pipx|uvx|command.*mcp|env.*TOKEN" in **/*.{json,yaml,yml}
Comparison to Other Tools
| Tool |
Catches this? |
Notes |
| Semgrep |
Partial |
Can catch unpinned MCP/server packages and obvious approval bypass patterns with custom rules. |
| CodeQL |
Partial |
Can model TOCTOU data flow if queries are written for approval artifacts and tool execution sinks. |
| OPA / Cedar / Zanzibar-style policy |
Partial |
Strong for deterministic policy checks, but only if the exact approved artifact is enforced at execution time. |
Overall Assessment
Strengths: Strong coverage of least privilege, HITL gates, blast radius, auditability, rollback, and multi-agent risk.
Needs improvement: Add approval artifact binding and MCP/plugin server supply-chain checks, and avoid requiring raw prompt/CoT logs where safer audit primitives exist.
Priority recommendations:
- Require approvals to bind to canonical executable parameters, not summaries.
- Treat MCP/plugin/tool servers as privileged dependencies with pinned versions, provenance, and scoped secrets.
- Document privacy-preserving audit patterns as acceptable alternatives to raw prompt/CoT retention.
Bounty Info
Skill Being Reviewed
Skill name: agent-security
Skill path:
skills/ai-security/agent-security/False Positive Analysis
Benign code that triggers a false positive:
Why this is a false positive:
The skill asks whether the agent's reasoning and prompt/context are logged. That can over-flag privacy-preserving designs that intentionally avoid raw prompt or chain-of-thought retention. For regulated systems, storing hashes, policy decisions, redacted tool parameters, tool results, user/agent IDs, and immutable correlation IDs can be stronger and safer than storing sensitive prompts verbatim.
Coverage Gaps
Missed variant 1:
Why it should be caught:
The skill reviews HITL gate placement and context quality, but does not require the approval decision to be cryptographically or canonically bound to the exact tool name, arguments, resource IDs, risk tier, and nonce that will execute. This creates a TOCTOU approval bypass: a human approves a harmless summary, then the agent changes parameters before execution.
Missed variant 2:
{ "mcpServers": { "crm": { "command": "npx", "args": ["@vendor/crm-mcp-server@latest"], "env": { "CRM_TOKEN": "${CRM_TOKEN}" } } } }Why it should be caught:
The skill covers tool breadth and dynamic tool sets, but should explicitly check tool-provider provenance for MCP/plugin servers: pinned package versions, signed manifests, allowed server identities, per-server secret scopes, and review of tool schemas. An unpinned tool server with a broad token can become the agent's effective permission boundary.
Edge Cases
Approval summaries are often generated by the same model being constrained. If the approval UI displays only a model-written summary, the human may approve a different action than the one executed. The safer design is a deterministic approval artifact: canonical JSON of tool, parameters, resource identifiers, risk tier, expiry, nonce, and policy decision hash. Execution should fail unless the exact artifact was approved.
Another edge case is audit minimization. "No raw prompt logs" should not automatically be a finding when the design has immutable hashes, redacted parameters, policy traces, and replayable event IDs.
Remediation Quality
Suggested detection additions:
Comparison to Other Tools
Overall Assessment
Strengths: Strong coverage of least privilege, HITL gates, blast radius, auditability, rollback, and multi-agent risk.
Needs improvement: Add approval artifact binding and MCP/plugin server supply-chain checks, and avoid requiring raw prompt/CoT logs where safer audit primitives exist.
Priority recommendations:
Bounty Info