Skip to content

feat(agent-os): add MCP security primitives with OWASP coverage#822

Closed
jackbatzner wants to merge 1 commit intomicrosoft:mainfrom
jackbatzner:jb/python-mcp-core
Closed

feat(agent-os): add MCP security primitives with OWASP coverage#822
jackbatzner wants to merge 1 commit intomicrosoft:mainfrom
jackbatzner:jb/python-mcp-core

Conversation

@jackbatzner
Copy link
Copy Markdown
Contributor

Description

Adds core MCP (Model Context Protocol) security primitives to the Python agent-os package, implementing coverage for 11 of 12 sections of the OWASP MCP Security Cheat Sheet.

MCP Security Primitives:

  • MCPGateway — centralized policy enforcement with tool-call approval, deny-by-default, audit logging
  • MCPSessionAuthenticator — session lifecycle management with token rotation and expiry
  • MCPMessageSigner — HMAC-SHA256 message signing for integrity verification
  • MCPResponseScanner — output scanning for injection, exfiltration, and credential leaks
  • MCPSecurityScanner — input validation and threat classification
  • MCPSlidingRateLimiter — per-client sliding-window rate limiting with LRU eviction
  • CredentialRedactor — pattern-based credential redaction including full PEM blocks

Enterprise-Grade Patterns:

  • Persistence seams — protocol abstractions (mcp_protocols.py) for sessions, nonces, rate limits, and audit with in-memory defaults, swappable to Redis/DB
  • Clock/nonce injection — no hardcoded time.time() or uuid.uuid4(); all injected via constructor for deterministic testing
  • Fail-closed enforcement — all security gates deny on error, no silent pass-throughs
  • Redaction-safe audit — credential redactor runs on audit entries before storage
  • Thread safetythreading.Lock on mutable state in rate limiter and session authenticator

Tests: 2938 passed, 86 skipped

OWASP MCP Security Cheat Sheet Coverage:
§1 Tool Poisoning ✅ | §2 Rug Pull ✅ | §3 Tool Shadowing ✅ | §4 Indirect Prompt Injection ✅ | §5 Server Compromise ✅ | §6 Credential Theft ✅ | §7 Resource Exhaustion ✅ | §8 Logging & Monitoring ✅ | §9 Privilege Escalation ✅ | §10 Consent Phishing ✅ | §11 Consent UI (N/A — server-side SDK) | §12 Standards Compliance ✅

Part 1 of 3 — This PR contains the core security code and tests. See also:

  • Part 2: Standalone agent-mcp-governance package (targets this branch)
  • Part 3: Documentation and examples (targets Part 2)

Type of Change

  • New feature (non-breaking change that adds functionality)
  • Security fix

Package(s) Affected

  • agent-os-kernel

Checklist

  • My code follows the project style guidelines (ruff check)
  • I have added tests that prove my fix/feature works
  • All new and existing tests pass (pytest)
  • I have updated documentation as needed
  • I have signed the Microsoft CLA

Related Issues

Supersedes #774 (split for easier review)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 6, 2026

Welcome to the Agent Governance Toolkit! Thanks for your first pull request.
Please ensure tests pass, code follows style (ruff check), and you have signed the CLA.
See our Contributing Guide.

@github-actions github-actions bot added tests size/XL Extra large PR (500+ lines) labels Apr 6, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 6, 2026

🤖 AI Agent: breaking-change-detector — Summary

🔍 API Compatibility Report

Summary

This pull request introduces several new features and components to the agent-os package, including MCP security primitives and OpenTelemetry metrics support. The changes are primarily additive, with no evidence of removed or modified existing public APIs. Therefore, no breaking changes have been identified.

Findings

Severity Package Change Impact
🔵 agent-os Added multiple new classes and functions New functionality, not breaking

Details

🔵 New Public API

  1. New Classes:

    • MCPGateway
    • MCPSessionAuthenticator
    • MCPMessageSigner
    • MCPResponseScanner
    • MCPSecurityScanner
    • MCPSlidingRateLimiter
    • CredentialRedactor
    • MCPMetrics
    • NoOpMCPMetrics
  2. New Functions:

    • Methods within the above classes, such as CredentialRedactor.redact, MCPMetrics.record_decision, etc.
  3. New Protocols:

    • MCPMetricsRecorder (for metrics emission).
  4. New Exports in __init__.py:

    • All new classes and functions have been added to the agent-os package's __init__.py file, making them publicly accessible.

Migration Guide

No migration steps are necessary as no breaking changes were identified. Downstream users can adopt the new functionality without modifying existing code.

Conclusion

No breaking changes detected. This pull request is safe to merge from an API compatibility perspective.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 6, 2026

🤖 AI Agent: docs-sync-checker — Issues Found

📝 Documentation Sync Report

Issues Found

  1. MCPMetricsRecorder in packages/agent-os/src/agent_os/_mcp_metrics.py — missing docstring for public methods:
    • record_decision
    • record_threats_detected
    • record_rate_limit_hit
    • record_scan
  2. CredentialRedactor in packages/agent-os/src/agent_os/credential_redactor.py — missing docstring for public methods:
    • redact_mapping
    • redact_dictionary
    • redact_data_structure
    • contains_credentials
    • detect_credential_types
    • find_matches
  3. ⚠️ packages/agent-os/README.md — no mention of the new MCP security primitives or their usage.
  4. ⚠️ CHANGELOG.md — no entry for the addition of MCP security primitives or OWASP coverage.

Suggestions

  • 💡 Add detailed docstrings for the methods in MCPMetricsRecorder and CredentialRedactor. For example:
    def record_decision(
        self,
        *,
        allowed: bool,
        agent_id: str,
        tool_name: str,
        stage: str,
    ) -> None:
        """
        Record an allow or deny decision made by the MCP gateway.
    
        Args:
            allowed (bool): Whether the decision was to allow (True) or deny (False) the request.
            agent_id (str): The identifier of the agent making the request.
            tool_name (str): The name of the tool being accessed.
            stage (str): The stage of the decision-making process.
        """
  • 💡 Update the README.md to include a new section on "MCP Security Primitives" with descriptions of the new classes and their use cases.
  • 💡 Add an entry to the CHANGELOG.md summarizing the new MCP security primitives and their alignment with the OWASP MCP Security Cheat Sheet.

Additional Notes

  • The new classes and methods are well-structured and include type annotations, which is excellent.
  • The CredentialRedactor class has a class-level docstring, but individual methods lack detailed explanations.
  • The PR mentions that documentation and examples will be added in a future part (Part 3). However, the current changes already introduce new public APIs that should be documented immediately to maintain sync.

Action Required

Please address the missing docstrings, update the README, and add a CHANGELOG entry to ensure documentation is in sync with the new functionality.

Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Agent: code-reviewer

Review of feat(agent-os): add MCP security primitives with OWASP coverage

This PR introduces a significant set of security primitives for the agent-os package, implementing a robust framework for MCP (Model Context Protocol) governance. The changes are extensive and touch on multiple aspects of the system, including policy enforcement, session management, message signing, input/output scanning, rate limiting, and credential redaction. Below is a detailed review of the PR, focusing on the specified areas of concern:


🔴 CRITICAL

  1. Credential Redaction and Logging

    • The CredentialRedactor logs the number of redactions applied (logger.info("Credential redaction applied to %s value(s)", redaction_count)). While this is useful for debugging, it could inadvertently leak sensitive information in logs if the redacted content is logged elsewhere. Ensure that logging is sanitized and does not expose sensitive data.
    • Action: Add a warning in the documentation and code comments to ensure that sensitive data is not logged elsewhere in the application.
  2. HMAC-SHA256 Key Management

    • The MCPMessageSigner uses HMAC-SHA256 for message signing. However, the PR does not include details about how the signing keys are generated, stored, or rotated. Improper key management can lead to severe security vulnerabilities.
    • Action: Ensure that signing keys are securely generated, stored, and rotated. Consider integrating with a secure key management service (e.g., Azure Key Vault).
  3. Fail-Closed Mechanism

    • While the PR mentions that the system is designed to "fail-closed," there is no explicit test coverage to verify this behavior. For example, the _evaluate method in MCPGateway has a try-except block to handle unexpected errors, but the test suite does not seem to include cases that simulate such errors.
    • Action: Add test cases to explicitly verify that the system fails closed under various failure scenarios, such as exceptions during policy evaluation or rate-limiting checks.
  4. Thread Safety

    • The PR mentions the use of threading.Lock for thread safety in the rate limiter and session authenticator. However, there is no evidence of tests that validate the thread safety of these components under concurrent execution.
    • Action: Add stress tests to simulate concurrent access to shared resources (e.g., rate limit counters, session stores) and verify that the locks are functioning as intended.
  5. Nonce Management

    • The PR introduces MCPNonceStore for nonce management but does not provide details on how nonce uniqueness is enforced across distributed systems. This could lead to replay attacks if the same nonce is reused.
    • Action: Ensure that nonce generation and storage are designed to prevent collisions, even in distributed environments. Consider using a distributed key-value store like Redis for nonce storage.

🟡 WARNING

  1. Backward Compatibility

    • The PR adds a significant number of new classes and methods to the agent-os package. While these additions are non-breaking, they increase the surface area of the public API.
    • Action: Clearly document all new public APIs and their intended usage. Consider marking new APIs as "experimental" if they are subject to change in future releases.
  2. OpenTelemetry Integration

    • The MCPMetrics class introduces optional OpenTelemetry integration. However, the PR does not include any tests to verify the behavior when OpenTelemetry is unavailable or misconfigured.
    • Action: Add test cases to validate the behavior of MCPMetrics in environments where OpenTelemetry is not installed or fails to initialize.

💡 SUGGESTIONS

  1. Input Validation

    • The CredentialRedactor class includes a PATTERNS attribute with predefined regex patterns for detecting credentials. While these patterns are comprehensive, they may not cover all possible formats of sensitive data.
    • Suggestion: Allow users to extend or override the default patterns via configuration or constructor arguments.
  2. Performance Optimization

    • The CredentialRedactor applies all patterns sequentially, which could be inefficient for large inputs. While the current implementation is acceptable, consider optimizing the pattern matching process using a single combined regex or a more efficient algorithm.
    • Suggestion: Evaluate the performance of the redaction process with large inputs and optimize if necessary.
  3. Documentation

    • The PR introduces several new classes and methods, but the accompanying documentation is minimal. For example, the MCPGateway class has a docstring, but it does not explain all the configuration options in detail.
    • Suggestion: Expand the documentation to include examples and detailed explanations of each new class and method. This will help users understand how to integrate these features into their applications.
  4. Test Coverage

    • While the PR mentions that 2938 tests passed, it also notes that 86 tests were skipped. It's unclear why these tests were skipped.
    • Suggestion: Investigate the skipped tests and ensure they are either fixed or explicitly marked as expected to be skipped with a clear explanation.
  5. Error Handling

    • The _evaluate method in MCPGateway catches all exceptions and logs an error message. While this is a good fail-safe mechanism, it may mask underlying issues.
    • Suggestion: Log the specific exception message and type to aid in debugging.
  6. Standards Compliance

    • The PR claims compliance with 11 of the 12 sections of the OWASP MCP Security Cheat Sheet. However, there is no evidence of a systematic mapping between the implemented features and the OWASP guidelines.
    • Suggestion: Include a detailed mapping in the documentation to demonstrate how each section of the OWASP MCP Security Cheat Sheet is addressed.

Summary

This PR is a significant step forward in enhancing the security posture of the agent-os package. However, there are several critical areas that need to be addressed to ensure the robustness and reliability of the new features. The recommendations above should be implemented to mitigate potential security risks, ensure backward compatibility, and improve the overall quality of the codebase.

  • Critical Issues: 5
  • Warnings: 2
  • Suggestions: 6

Please address the critical issues and warnings before merging this PR. The suggestions can be addressed in subsequent iterations if necessary.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 6, 2026

🤖 AI Agent: security-scanner — Security Review of PR: `feat(agent-os): add MCP security primitives with OWASP coverage`

Security Review of PR: feat(agent-os): add MCP security primitives with OWASP coverage

This PR introduces a significant update to the agent-os package by adding core MCP (Model Context Protocol) security primitives. These primitives aim to enhance the security of the agent governance toolkit by addressing 11 of the 12 sections of the OWASP MCP Security Cheat Sheet. Below is a detailed security review of the changes, categorized by potential vulnerabilities.


Findings

1. Prompt Injection Defense Bypass

Risk: 🔴 CRITICAL
Issue: The CredentialRedactor class is designed to redact sensitive information from strings and nested data structures. However, the regex patterns used for detecting sensitive information (e.g., API keys, tokens, private keys) are not exhaustive and may fail to detect obfuscated or slightly altered credentials (e.g., base64-encoded secrets, secrets split across multiple lines). Additionally, the redaction process does not sanitize inputs for prompt injection attacks, which could allow malicious actors to bypass redaction and inject harmful content into downstream systems.
Attack Vector: An attacker could craft inputs that evade the regex patterns (e.g., by using slight variations or encoding) and inject malicious payloads into the system. These payloads could then propagate to downstream systems or logs, potentially leading to unauthorized access or data leakage.
Recommendation:

  • Expand the regex patterns to cover more variations of sensitive data (e.g., base64-encoded secrets, multiline secrets).
  • Integrate a prompt injection detection mechanism into the CredentialRedactor class to identify and neutralize potential injection attacks.
  • Consider using a library designed for sensitive data detection and redaction, such as truffleHog, for more comprehensive coverage.

2. Policy Engine Circumvention

Risk: 🔴 CRITICAL
Issue: The MCPGateway class implements a policy enforcement mechanism but relies on a GovernancePolicy object for defining constraints and thresholds. If the GovernancePolicy object is misconfigured or compromised, the entire policy enforcement mechanism could be bypassed. Additionally, the _evaluate method in MCPGateway does not include a final "catch-all" check to ensure that unhandled cases are denied by default.
Attack Vector: An attacker could exploit a misconfigured or compromised GovernancePolicy to bypass critical security checks, allowing unauthorized tool usage or data exfiltration.
Recommendation:

  • Add a final "catch-all" check in the _evaluate method to ensure that any unhandled cases are denied by default.
  • Implement stricter validation and sanity checks for the GovernancePolicy object to ensure it is correctly configured and not tampered with.
  • Consider adding a mechanism to validate the integrity of the GovernancePolicy object (e.g., using cryptographic signatures).

3. Trust Chain Weaknesses

Risk: 🟠 HIGH
Issue: The MCPMessageSigner class uses HMAC-SHA256 for message signing, which is generally secure. However, the implementation does not specify how the HMAC key is managed, stored, or rotated. If the key is compromised, the integrity of the signed messages could be at risk.
Attack Vector: If an attacker gains access to the HMAC key, they could forge valid signatures and bypass integrity checks, potentially leading to unauthorized actions or data tampering.
Recommendation:

  • Implement secure key management practices, such as using a hardware security module (HSM) or a secure key management service (e.g., AWS KMS, Azure Key Vault).
  • Add support for key rotation to minimize the impact of a compromised key.
  • Document the key management strategy in the project documentation.

4. Credential Exposure

Risk: 🔴 CRITICAL
Issue: The CredentialRedactor class logs the number of redacted credentials using logger.info. While the actual credentials are not logged, the logging of redaction counts could still provide attackers with information about the presence of sensitive data.
Attack Vector: An attacker monitoring the logs could infer the presence of sensitive data based on the redaction count, potentially aiding in targeted attacks.
Recommendation:

  • Avoid logging redaction counts or make this behavior configurable.
  • If logging is necessary, ensure that logs are stored securely and access to them is restricted.

5. Sandbox Escape

Risk: 🔵 LOW
Issue: No explicit sandboxing mechanisms are mentioned in the PR. While this is not a direct vulnerability, the lack of sandboxing increases the risk of malicious code execution.
Attack Vector: If an attacker manages to inject malicious code into the system, the absence of sandboxing could allow the code to access sensitive resources or compromise the host system.
Recommendation:

  • Consider integrating a sandboxing mechanism (e.g., containerization, seccomp, or AppArmor) to isolate the execution environment.
  • Document the sandboxing strategy in the project documentation.

6. Deserialization Attacks

Risk: 🟠 HIGH
Issue: The CredentialRedactor and MCPGateway classes process user-provided input (e.g., tool parameters, audit logs) without explicitly validating or sanitizing it. If these inputs are deserialized without proper checks, they could be exploited for deserialization attacks.
Attack Vector: An attacker could craft malicious input that, when deserialized, executes arbitrary code or causes a denial of service.
Recommendation:

  • Use safe deserialization libraries (e.g., json.loads with strict schema validation).
  • Avoid using pickle or other unsafe deserialization methods.
  • Add input validation and sanitization steps before processing user-provided data.

7. Race Conditions

Risk: 🟡 MEDIUM
Issue: The MCPGateway class uses a dictionary (_agent_call_counts) for per-agent call counters without any synchronization mechanism. This could lead to race conditions in multi-threaded environments.
Attack Vector: An attacker could exploit race conditions to bypass rate-limiting checks or cause inconsistent state in the call counters.
Recommendation:

  • Use thread-safe data structures (e.g., collections.defaultdict with threading.Lock) for managing shared state.
  • Add unit tests to simulate concurrent access and verify thread safety.

8. Supply Chain

Risk: 🟠 HIGH
Issue: The PR introduces a dependency on OpenTelemetry (opentelemetry), but there is no mention of how this dependency is managed or verified.
Attack Vector: An attacker could exploit dependency confusion or typosquatting to introduce malicious code into the project.
Recommendation:

  • Use a dependency management tool (e.g., pip-tools, poetry) to lock dependencies to specific versions.
  • Regularly audit dependencies for known vulnerabilities using tools like safety or dependabot.
  • Verify the integrity of new dependencies before adding them to the project.

Summary of Findings

Category Risk Description Recommendation
Prompt Injection Defense Bypass 🔴 CRITICAL Regex patterns in CredentialRedactor may fail to detect obfuscated credentials or injections. Expand regex patterns, integrate prompt injection detection, or use a specialized library.
Policy Engine Circumvention 🔴 CRITICAL Lack of a "catch-all" check in _evaluate and potential for misconfigured GovernancePolicy. Add a catch-all check, validate GovernancePolicy, and consider integrity checks.
Trust Chain Weaknesses 🟠 HIGH No details on HMAC key management or rotation in MCPMessageSigner. Implement secure key management and key rotation.
Credential Exposure 🔴 CRITICAL Logging redaction counts could leak information about sensitive data. Avoid logging redaction counts or secure log storage.
Sandbox Escape 🔵 LOW No explicit sandboxing mechanisms are mentioned. Consider integrating sandboxing mechanisms.
Deserialization Attacks 🟠 HIGH Lack of input validation and potential unsafe deserialization. Use safe deserialization libraries and validate inputs.
Race Conditions 🟡 MEDIUM Potential race conditions in _agent_call_counts dictionary. Use thread-safe data structures and add synchronization.
Supply Chain 🟠 HIGH New dependency (opentelemetry) not verified for integrity or vulnerabilities. Use dependency management tools and audit dependencies.

General Recommendations

  1. Perform a comprehensive security review of the GovernancePolicy object to ensure it cannot be misconfigured or tampered with.
  2. Add unit tests for edge cases, including malformed inputs, concurrent access, and policy misconfigurations.
  3. Document all security mechanisms, including key management, sandboxing, and dependency management, in the project documentation.

This PR introduces critical security features but also has several areas that require immediate attention to ensure the robustness of the MCP security primitives.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 6, 2026

🤖 AI Agent: test-generator — `packages/agent-os/src/agent_os/__init__.py`

🧪 Test Coverage Analysis

packages/agent-os/src/agent_os/__init__.py

  • ✅ Existing coverage: This file primarily handles imports and exposes public APIs. Coverage is indirectly provided by tests for the underlying modules.
  • ❌ Missing coverage: None specific to this file, as it does not contain logic beyond imports and definitions.
  • 💡 Suggested test cases:
    • None required for this file.

packages/agent-os/src/agent_os/_mcp_metrics.py

  • ✅ Existing coverage: No explicit test file exists for this module. However, if other modules indirectly use MCPMetrics or NoOpMCPMetrics, some coverage might exist.
  • ❌ Missing coverage:
    • Direct tests for MCPMetrics and NoOpMCPMetrics classes.
    • Testing the behavior when OpenTelemetry is unavailable (_HAS_OTEL = False).
    • Testing the record_* methods for various scenarios, including invalid inputs and edge cases.
  • 💡 Suggested test cases:
    1. test_no_op_metrics — Verify that NoOpMCPMetrics methods do not raise errors and perform no operations.
    2. test_metrics_with_opentelemetry_available — Mock OpenTelemetry availability and test that metrics are recorded correctly.
    3. test_metrics_with_opentelemetry_unavailable — Simulate OpenTelemetry being unavailable and ensure fallback to NoOpMCPMetrics.
    4. test_record_decision_invalid_inputs — Test record_decision with invalid inputs (e.g., missing required fields) and ensure no exceptions are raised.
    5. test_record_threats_detected_edge_cases — Test record_threats_detected with count=0 and negative values to ensure no metrics are recorded.

packages/agent-os/src/agent_os/credential_redactor.py

  • ✅ Existing coverage: No explicit test file exists for this module. However, other modules might indirectly test some functionality.
  • ❌ Missing coverage:
    • Direct tests for CredentialRedactor and its methods.
    • Edge cases for regex patterns (e.g., malformed inputs, edge-case strings).
    • Nested data structures for redact_data_structure.
  • 💡 Suggested test cases:
    1. test_redact_single_pattern — Test redaction of a single credential type (e.g., OpenAI API key) in a string.
    2. test_redact_multiple_patterns — Test redaction of multiple credential types in a single string.
    3. test_redact_nested_structures — Test redaction in deeply nested dictionaries, lists, and tuples.
    4. test_contains_credentials — Verify detection of credential presence in various inputs.
    5. test_find_matches_edge_cases — Test find_matches with edge-case strings (e.g., empty strings, strings with partial matches).
    6. test_redact_empty_inputs — Ensure empty inputs (e.g., None, empty strings, empty dictionaries) are handled gracefully.

packages/agent-os/src/agent_os/mcp_gateway.py

  • ✅ Existing coverage: Tests likely exist for the MCPGateway class, as it is a core component. However, specific edge cases might not be covered.
  • ❌ Missing coverage:
    • Edge cases for intercept_tool_call, such as invalid inputs or unexpected exceptions.
    • Behavior when approval_callback is None.
    • Handling of built-in dangerous patterns and custom patterns.
    • Rate-limiting logic and its edge cases.
  • 💡 Suggested test cases:
    1. test_intercept_tool_call_invalid_inputs — Test intercept_tool_call with invalid inputs (e.g., missing agent_id or tool_name).
    2. test_intercept_tool_call_approval_callback_none — Verify behavior when approval_callback is not provided.
    3. test_builtin_sanitization — Test that built-in dangerous patterns are correctly applied when enable_builtin_sanitization is True.
    4. test_rate_limiting — Test rate-limiting behavior, including edge cases (e.g., hitting the exact limit, exceeding the limit).
    5. test_audit_logging — Verify that audit entries are correctly recorded for various scenarios, including allowed and denied tool calls.

packages/agent-os/src/agent_os/mcp_message_signer.py

  • ✅ Existing coverage: No explicit test file exists for this module. Some coverage might exist if other modules use MCPMessageSigner.
  • ❌ Missing coverage:
    • Direct tests for MCPMessageSigner and MCPSignedEnvelope.
    • Edge cases for message signing and verification (e.g., invalid keys, tampered messages).
  • 💡 Suggested test cases:
    1. test_message_signing — Verify that MCPMessageSigner correctly signs messages.
    2. test_message_verification_success — Test successful verification of a correctly signed message.
    3. test_message_verification_failure — Test verification failure for tampered messages or invalid keys.
    4. test_message_signing_with_invalid_key — Test behavior when an invalid key is provided for signing.

packages/agent-os/src/agent_os/mcp_protocols.py

  • ✅ Existing coverage: No explicit test file exists for this module. Some coverage might exist if other modules use the protocol abstractions.
  • ❌ Missing coverage:
    • Direct tests for InMemoryAuditSink, InMemoryNonceStore, InMemoryRateLimitStore, and InMemorySessionStore.
    • Edge cases for persistence seams (e.g., missing data, concurrent access).
  • 💡 Suggested test cases:
    1. test_in_memory_audit_sink — Test basic functionality of InMemoryAuditSink, including adding and retrieving audit entries.
    2. test_in_memory_nonce_store — Test nonce generation and validation, including edge cases (e.g., duplicate nonces).
    3. test_in_memory_rate_limit_store — Test rate-limiting behavior, including edge cases (e.g., hitting the limit, resetting limits).
    4. test_in_memory_session_store — Test session lifecycle management, including token rotation and expiry.

packages/agent-os/src/agent_os/mcp_response_scanner.py

  • ✅ Existing coverage: No explicit test file exists for this module. Some coverage might exist if other modules use MCPResponseScanner.
  • ❌ Missing coverage:
    • Direct tests for MCPResponseScanner and MCPResponseScanResult.
    • Edge cases for output scanning (e.g., malformed inputs, edge-case strings).
  • 💡 Suggested test cases:
    1. test_response_scanning_single_threat — Test detection of a single threat in a response.
    2. test_response_scanning_multiple_threats — Test detection of multiple threats in a single response.
    3. test_response_scanning_no_threats — Test behavior when no threats are present in the response.
    4. test_response_scanning_malformed_inputs — Test behavior with malformed or unexpected inputs.

packages/agent-os/src/agent_os/mcp_security.py

  • ✅ Existing coverage: No explicit test file exists for this module. Some coverage might exist if other modules use MCPSecurityScanner.
  • ❌ Missing coverage:
    • Direct tests for MCPSecurityScanner and MCPThreatType.
    • Edge cases for input validation and threat classification.
  • 💡 Suggested test cases:
    1. test_security_scanner_single_threat — Test detection of a single threat in input data.
    2. test_security_scanner_multiple_threats — Test detection of multiple threats in a single input.
    3. test_security_scanner_no_threats — Test behavior when no threats are present in the input.
    4. test_security_scanner_invalid_inputs — Test behavior with invalid or malformed inputs.

packages/agent-os/src/agent_os/mcp_session_auth.py

  • ✅ Existing coverage: No explicit test file exists for this module. Some coverage might exist if other modules use MCPSessionAuthenticator.
  • ❌ Missing coverage:
    • Direct tests for MCPSessionAuthenticator and MCPSession.
    • Edge cases for session lifecycle management (e.g., expired tokens, invalid tokens).
  • 💡 Suggested test cases:
    1. test_session_creation — Test creation of a new session.
    2. test_session_expiry — Test behavior when a session token expires.
    3. test_session_invalid_token — Test behavior when an invalid token is provided.
    4. test_session_token_rotation — Test token rotation functionality.

packages/agent-os/src/agent_os/mcp_sliding_rate_limiter.py

  • ✅ Existing coverage: No explicit test file exists for this module. Some coverage might exist if other modules use MCPSlidingRateLimiter.
  • ❌ Missing coverage:
    • Direct tests for MCPSlidingRateLimiter.
    • Edge cases for rate limiting (e.g., boundary conditions, race conditions).
  • 💡 Suggested test cases:
    1. test_rate_limiter_within_limit — Test behavior when requests are within the rate limit.
    2. test_rate_limiter_exceeding_limit — Test behavior when requests exceed the rate limit.
    3. test_rate_limiter_reset — Test rate limit reset behavior after a time window.
    4. test_rate_limiter_concurrent_access — Test for race conditions with concurrent requests.

packages/agent-os/src/agent_os/policies/async_evaluator.py

  • ✅ Existing coverage: No explicit test file exists for this module. Some coverage might exist if other modules use async_evaluator.
  • ❌ Missing coverage:
    • Direct tests for async_evaluator.
    • Edge cases for policy evaluation (e.g., conflicting policies, boundary conditions).
  • 💡 Suggested test cases:
    1. test_async_policy_evaluation — Test evaluation of a single policy asynchronously.
    2. test_conflicting_policies — Test behavior when multiple policies conflict.
    3. test_policy_evaluation_timeout — Test behavior when policy evaluation exceeds a timeout.
    4. test_policy_evaluation_invalid_inputs — Test behavior with invalid or malformed inputs.

Summary

  • Files with missing coverage: _mcp_metrics.py, credential_redactor.py, mcp_gateway.py, mcp_message_signer.py, mcp_protocols.py, mcp_response_scanner.py, mcp_security.py, mcp_session_auth.py, mcp_sliding_rate_limiter.py, policies/async_evaluator.py.
  • Key focus areas for new tests: Edge cases in policy evaluation, trust scoring, chaos experiments, concurrency, and input validation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL Extra large PR (500+ lines) tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant