blog: MCP Security — Why Your AI Agent Tool Calls Need a Firewall by aymenhmaidiwastaken · Pull Request #899 · microsoft/agent-governance-toolkit

aymenhmaidiwastaken · 2026-04-07T21:23:52Z

Closes #848

Drafted the MCP security blog post covering the threat landscape around AI agent tool calls — tool poisoning, rug-pull attacks, cross-server data leakage, and over-permissioned tools with concrete attack scenarios.

Includes six practical recommendations: tool allowlisting, definition fingerprinting, argument boundary enforcement, human-in-the-loop for sensitive ops, runtime monitoring, and trust domain isolation.

Happy to revise based on feedback!

…Calls Addresses microsoft#848. Covers MCP threat surface (tool poisoning, rug-pull attacks, cross-server data leakage, over-permissioned tools), real attack scenarios, and practical defenses including tool allowlisting, definition fingerprinting, argument boundary enforcement, and runtime monitoring.

github-actions · 2026-04-07T21:24:45Z

Welcome to the Agent Governance Toolkit! Thanks for your first pull request.
Please ensure tests pass, code follows style (ruff check), and you have signed the CLA.
See our Contributing Guide.

github-actions · 2026-04-07T21:24:57Z

🤖 AI Agent: contributor-guide — 🌟 What You Did Well

Hi @aymenhmaidiwastaken! 👋

Welcome to the Agent Governance Toolkit community, and thank you for contributing your time and expertise! 🎉 Your blog post draft is incredibly thoughtful and well-researched — it's clear you've put a lot of effort into breaking down complex security concepts into actionable advice. Let's dive into the review!

🌟 What You Did Well

Clarity and Structure: Your blog post is exceptionally well-organized. The "Threat Surface" and "Practical Recommendations" sections are easy to follow and provide a logical flow from problem to solution.
Actionable Advice: The six recommendations are practical, detailed, and immediately useful for readers. The inclusion of YAML examples and JSON schemas is a great touch to make the concepts tangible.
Real-World Scenarios: The attack scenarios you described are both realistic and compelling. They help illustrate the risks in a way that will resonate with practitioners.
Community Alignment: Your post aligns perfectly with the goals of this repository — promoting secure and responsible agent governance. The tie-ins to OWASP and the MCP Trust Guide are excellent.

🛠 Suggestions for Improvement

Here are a few areas where we can refine your contribution to align with project conventions and ensure maximum impact:

1. File Placement

Blog posts in this repository are typically placed under packages/{name}/docs/blog/. You've done this correctly by placing the file in packages/agent-mesh/docs/blog/. ✅
However, could you also add a test case to ensure the blog post renders correctly in our documentation pipeline? Tests for this package should go in packages/agent-mesh/tests/. You can create a simple test to verify the file's presence and formatting.

2. Linting

We use ruff for linting with a focus on E, F, and W error codes. While your blog post is Markdown and won't be linted directly, make sure any Python code snippets (like the MCP Security Scanner link) adhere to PEP 8 standards. If you include runnable Python examples in the future, running ruff locally will help catch issues early.

3. Commit Message

We follow the Conventional Commits standard for commit messages. Your commit message should start with a prefix like docs: to indicate the type of change. For example:
```
docs: add MCP security blog post on tool call firewalls
```
This helps maintainers quickly understand the purpose of your changes and ensures consistent commit history.

4. Security-Sensitive Content

Since this blog post discusses security-sensitive topics, it will receive extra scrutiny. You've done a great job referencing OWASP and providing concrete examples, but it would be helpful to link directly to the OWASP Top 10 for LLMs for readers who want to dive deeper.

5. Cross-Referencing Internal Resources

You’ve already linked to the MCP Trust Guide and the MCP Security Scanner. Great job! To make this even more robust, consider adding a link to our CONTRIBUTING.md file for readers who might want to contribute to the toolkit after reading your post.

🔗 Helpful Resources

Here are some resources to help you refine your contribution:

CONTRIBUTING.md: Guidelines for contributing to this repository.
QUICKSTART.md: A quick guide to getting started with the Agent Governance Toolkit.
OWASP Top 10 for LLMs: A great resource for understanding AI-specific security risks.

✅ Next Steps

Address the feedback above:
- Add a test case for the blog post in packages/agent-mesh/tests/.
- Ensure your commit message follows the docs: prefix convention.
- Optionally, add a link to the OWASP Top 10 for LLMs.
Push your changes to this branch. Once updated, our CI/CD pipeline will automatically re-run checks.
Let us know if you have any questions or need clarification on any of the feedback!

Once you've made these updates, we'll review your PR again and work towards merging it. Thank you for helping us make the Agent Governance Toolkit even better! 🚀

Looking forward to your updates! 😊

github-actions

🤖 AI Agent: code-reviewer

Feedback on Pull Request: blog: MCP Security — Why Your AI Agent Tool Calls Need a Firewall

🔴 CRITICAL

Tool Description Injection Vulnerability
The blog correctly highlights the risk of tool poisoning via description injection but does not explicitly recommend sanitizing tool descriptions before they are consumed by the agent. This is a critical omission because malicious descriptions can bypass LLM safeguards.
Actionable Recommendation: Add explicit guidance to sanitize tool descriptions for hidden instructions or malicious payloads before they are presented to the agent. This could include stripping non-visible characters, detecting prompt injection patterns, and validating descriptions against a whitelist of allowed patterns.
Cross-Server Data Leakage
While the blog mentions the risk of cross-server data leakage, it does not provide concrete implementation details for tracking data provenance across tool calls. Without this, the recommendation for isolating MCP server trust domains lacks actionable guidance.
Actionable Recommendation: Include technical details on how to implement data provenance tracking, such as tagging data with metadata about its origin and enforcing policies based on these tags.

🟡 WARNING

Backward Compatibility of Tool Fingerprinting
The recommendation to fingerprint tool definitions and block tools with changed definitions could lead to breaking changes in production environments. If an MCP server updates a tool description or schema for legitimate reasons (e.g., bug fixes or feature enhancements), agents may fail to function unless the fingerprints are updated.
Actionable Recommendation: Suggest implementing a staged approval process for fingerprint changes, where updates are flagged but not immediately blocked. This allows operators to review and approve legitimate changes without disrupting production.

💡 SUGGESTIONS

Expand Human-in-the-Loop Guidance
The blog mentions human approval for sensitive operations but does not specify how this could be implemented in practice.
Actionable Recommendation: Provide examples of how to integrate human-in-the-loop mechanisms, such as using a webhook to trigger approval workflows in tools like Slack or Microsoft Teams.
Runtime Monitoring Details
The recommendation for runtime monitoring is high-level and does not specify what tools or frameworks could be used to implement anomaly detection.
Actionable Recommendation: Suggest specific technologies or libraries (e.g., OpenTelemetry for tracing, Elasticsearch for log analysis) that can be used to implement runtime monitoring.
OWASP Agentic Top 10 Mapping
While the blog references ASI01 (Prompt Injection), it could benefit from mapping the other threats (rug-pull attacks, data leakage, over-permissioned tools) to relevant OWASP Agentic Top 10 categories.
Actionable Recommendation: Expand the OWASP mapping to include ASI02 (Supply Chain Vulnerabilities) for rug-pull attacks and ASI03 (Data Leakage) for cross-server data leakage.
Tool Allowlist Implementation
The YAML example for tool allowlisting is helpful but lacks details on how this policy would be enforced programmatically.
Actionable Recommendation: Provide a code snippet or pseudocode demonstrating how the allowlist can be integrated into the agent's runtime logic.
Clarify "Excessive Data Volume" Detection
The blog mentions scanning arguments for excessive data volume but does not define thresholds or criteria for what constitutes "excessive."
Actionable Recommendation: Add guidance on setting thresholds based on tool schema expectations, such as maximum string lengths or array sizes.
Link to MCP Trust Guide and Security Scanner
The blog links to the MCP Trust Guide and Security Scanner but does not summarize their functionality or relevance to the recommendations.
Actionable Recommendation: Briefly describe what these resources provide and how they can help implement the defenses outlined in the blog.

General Observations

The blog is well-written and provides a clear overview of the MCP threat landscape. It effectively communicates the urgency of securing tool calls and offers practical recommendations.
The inclusion of real-world attack scenarios is excellent and helps illustrate the risks.
The blog aligns well with the goals of the repository and contributes valuable insights to the community.

Final Recommendation

Merge the pull request after addressing the critical issues and warnings. Consider incorporating the suggestions to further enhance the blog's utility and actionable guidance.

github-actions · 2026-04-07T21:24:59Z

🤖 AI Agent: security-scanner — Findings

This pull request adds a blog post discussing the security challenges of the Model Context Protocol (MCP) and provides practical recommendations for mitigating its risks. While the content is primarily documentation, it is highly relevant to the security of the microsoft/agent-governance-toolkit repository, as it addresses critical vulnerabilities in AI agent governance and tool invocation.

Findings

1. Prompt Injection Defense Bypass

Risk: The blog highlights a significant vulnerability: tool descriptions in MCP can be manipulated to include hidden instructions that the LLM will execute without user awareness. This is a form of prompt injection that bypasses traditional user-facing guardrails.
Rating: 🔴 CRITICAL
Attack Vector: An attacker controlling an MCP server could embed malicious instructions in a tool's description. The LLM would execute these instructions as part of its reasoning process, potentially leading to data exfiltration, unauthorized actions, or other malicious outcomes.
Suggested Fix: Implement a governance layer that validates tool descriptions against a strict allowlist or schema. Descriptions should be sanitized to remove any hidden instructions or suspicious content before being processed by the LLM.

2. Policy Engine Circumvention

Risk: The blog describes "rug-pull attacks," where an MCP server can modify tool definitions after they have been approved, effectively circumventing any pre-approval process.
Rating: 🔴 CRITICAL
Attack Vector: A server could initially provide a benign tool definition, gain approval, and later modify it to include malicious behavior. This undermines the policy engine's ability to enforce security controls.
Suggested Fix: Introduce fingerprinting of tool definitions (hashing descriptions, schemas, and metadata) and implement drift detection. Any changes to approved tools should trigger alerts and require re-approval.

3. Trust Chain Weaknesses

Risk: The blog identifies cross-server data leakage as a critical issue. Data from one MCP server can be inadvertently passed to another, violating trust boundaries.
Rating: 🔴 CRITICAL
Attack Vector: An agent could read sensitive data from one server and pass it as an argument to a tool on another server, leading to unauthorized data sharing or exfiltration.
Suggested Fix: Enforce trust domain isolation by tracking data provenance and implementing policies that explicitly control which data can flow between servers.

4. Credential Exposure

Risk: The blog warns about the risk of sensitive data, such as credentials or PII, being included in tool arguments and inadvertently exposed to untrusted servers.
Rating: 🟠 HIGH
Attack Vector: If an agent encounters sensitive data in its context, it might include this data in tool arguments, especially if the tool schema or description suggests doing so.
Suggested Fix: Implement runtime argument validation to scan for sensitive data patterns (e.g., API keys, PII) and block or sanitize such arguments before they are sent to the server.

5. Sandbox Escape

Risk: The blog does not explicitly discuss sandbox escape, but the mention of tools like execute_command implies a risk if such tools are exposed to agents without proper scoping.
Rating: 🟠 HIGH
Attack Vector: Over-permissioned tools like execute_command could allow an agent to execute arbitrary commands, potentially leading to a sandbox escape or system compromise.
Suggested Fix: Enforce least-privilege access by scoping tool availability per agent role and task. Sensitive tools should require explicit approval or be disabled entirely for most agents.

6. Supply Chain Risks

Risk: The blog's discussion of mutable tool definitions highlights a supply chain risk analogous to dependency confusion or typosquatting.
Rating: 🔴 CRITICAL
Attack Vector: An attacker could compromise an MCP server or publish a malicious server that mimics a legitimate one, tricking agents into using malicious tools.
Suggested Fix: Use server authentication (e.g., SPIFFE/SVID) and certificate pinning to ensure that agents only connect to trusted MCP servers. Additionally, maintain a registry of approved servers and tools.

Recommendations for the Blog Post

The blog post is well-written and provides a comprehensive overview of MCP-related security risks. However, it could be improved by:

Adding Examples of Defensive Implementations:
Include code snippets or pseudocode for implementing the recommended defenses, such as tool allowlisting, fingerprinting, and argument validation.
Highlighting Dependency Management:
Emphasize the importance of securing the MCP server supply chain, including verifying server authenticity and monitoring for typosquatting.
Expanding on Runtime Monitoring:
Provide more details on how to implement anomaly detection for tool call logs, including specific metrics or patterns to monitor.
Linking to Relevant Toolkit Features:
If the agent-governance-toolkit already includes features for MCP security (e.g., the MCP Security Scanner mentioned in the post), explicitly link to those features and explain how they address the described risks.

Final Assessment

This blog post is a valuable addition to the repository's documentation, as it raises awareness of critical security issues in MCP and provides actionable recommendations. However, given the severity of the risks discussed, it is essential to ensure that the blog aligns with the toolkit's existing features and provides clear guidance for implementing the proposed defenses.

imran-siddique

Thanks for writing about MCP security @aymenhmaidiwastaken! Great content. Two items before we can merge:

Sign the CLA — the license/cla check is still pending. Follow the bot instructions.
Publish the blog externally — per issue #848, the deliverable is a published post on Dev.to/Medium/Hashnode. Please publish the article there, then update the COMMUNITY.md link to point to the published URL instead of the in-repo path.

The content quality is excellent — looking forward to getting this merged once published!

aymenhmaidiwastaken · 2026-04-08T09:43:04Z

@microsoft-github-policy-service agree

aymenhmaidiwastaken · 2026-04-08T09:43:15Z

Thanks for the review @imran-siddique! Really appreciate the feedback.

I'll work on both items:

CLA — just signed it above
Publishing externally — I'll publish the article on Dev.to and update the COMMUNITY.md link to point there instead of the in-repo path. Will push the update once it's live.

Also, the AI code reviewer raised some solid points — I'll incorporate the critical ones (sanitizing tool descriptions, data provenance tracking details) and the OWASP Agentic Top 10 mapping before publishing. Should make the article stronger.

Will update the PR shortly!

imran-siddique · 2026-04-08T15:56:46Z

Great, thanks @aymenhmaidiwastaken! Take your time with the publishing. Once the blog is live and CLA is signed, ping us and we'll merge right away.

github-actions bot added the documentation Improvements or additions to documentation label Apr 7, 2026

github-actions bot added agent-mesh agent-mesh package size/M Medium PR (< 200 lines) labels Apr 7, 2026

github-actions bot reviewed Apr 7, 2026

View reviewed changes

imran-siddique requested changes Apr 8, 2026

View reviewed changes

Conversation

aymenhmaidiwastaken commented Apr 7, 2026

Uh oh!

github-actions bot commented Apr 7, 2026

Uh oh!

github-actions bot commented Apr 7, 2026

🌟 What You Did Well

🛠 Suggestions for Improvement

1. File Placement

2. Linting

3. Commit Message

4. Security-Sensitive Content

5. Cross-Referencing Internal Resources

🔗 Helpful Resources

✅ Next Steps

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

🤖 AI Agent: code-reviewer

Feedback on Pull Request: blog: MCP Security — Why Your AI Agent Tool Calls Need a Firewall

🔴 CRITICAL

🟡 WARNING

💡 SUGGESTIONS

General Observations

Final Recommendation

Uh oh!

github-actions bot commented Apr 7, 2026

Findings

1. Prompt Injection Defense Bypass

2. Policy Engine Circumvention

3. Trust Chain Weaknesses

4. Credential Exposure

5. Sandbox Escape

6. Supply Chain Risks

Recommendations for the Blog Post

Final Assessment

Uh oh!

imran-siddique left a comment

Choose a reason for hiding this comment

Uh oh!

aymenhmaidiwastaken commented Apr 8, 2026

Uh oh!

aymenhmaidiwastaken commented Apr 8, 2026

Uh oh!

imran-siddique commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants