feat(demo): add real-time agent governance dashboard with live policy monitoring by vnscka · Pull Request #750 · microsoft/agent-governance-toolkit

vnscka · 2026-04-03T16:11:07Z

🧾 Description

This PR introduces a real-time governance dashboard demo for visualizing agent interactions, policy decisions, trust relationships, and violations using a simulated event pipeline closes #723

✨ Key Features

Interactive Streamlit dashboard with live updates
Simulated event stream, including:
- Policy decisions (allow / deny / escalate)
- Trust score changes
Live policy feed
- Newest-first ordering
- Latest event visually highlighted
Trust heatmap
- Clearly labeled source/target relationships
Violation alerts
- Severity indicators
- Expandable JSON drill-down for inspection
Policy coverage legend
- 🟢 Allow | 🟠 Escalate | 🔴 Deny
Agent timeline
- Fully filter-aware visualization
Sidebar controls
- Live toggle
- Adjustable refresh rate
- Events per tick
- Filters (agent / decision / policy)
Graceful empty states
- Example: “No violations detected”

🚀 Run Instructions

Using Docker

docker compose up --build

Then open in browser:

http://localhost:8501

Run Locally (Without Docker)

pip install -r requirements.txt
streamlit run app.py

📊 Data Model

Uses simulated governance data
No external services or telemetry required
Designed for demo and experimentation purposes

✅ Validation

Tested via local execution using Docker and Streamlit.

Verified:

Real-time updates functioning correctly
Accurate filtering across:
- Policy feed
- Trust heatmap
- Violation alerts
- Agent timeline

🔐 Security / Compliance

No secrets or credentials included
Scope limited strictly to demo functionality
Dependencies are:
- Isolated
- Pinned in requirements.txt

github-actions · 2026-04-03T16:11:27Z

Welcome to the Agent Governance Toolkit! Thanks for your first pull request.
Please ensure tests pass, code follows style (ruff check), and you have signed the CLA.
See our Contributing Guide.

github-actions

🤖 AI Agent: code-reviewer

Code Review for `feat(demo): add real-time agent governance dashboard with live policy monitoring`

Summary

This PR introduces a real-time governance dashboard demo using Streamlit. It visualizes agent interactions, policy decisions, trust relationships, and violations using simulated event data. The dashboard includes features such as a live policy feed, trust heatmap, violation alerts, and an agent activity timeline.

🔴 CRITICAL Issues

Lack of Input Validation in simulator.py
- The append_events and _build_event functions generate data based on random inputs without any validation. This could lead to invalid or unexpected data being processed by the dashboard.
- Recommendation: Add validation checks for the generated data to ensure that it adheres to expected formats and constraints. For example, ensure that trust_score is always between 0 and 100.
Potential for Sandbox Escape
- The dashboard uses Streamlit, which executes Python code. If the details field in the simulated data or any other user-controlled input is not properly sanitized, it could lead to code injection or XSS vulnerabilities.
- Recommendation: Ensure that all user-controlled inputs and outputs are sanitized. Use html.escape or equivalent methods to escape any HTML/JavaScript content.
Thread Safety Concerns
- The st.session_state object is used to store and update shared state across multiple users. However, there is no explicit locking mechanism to prevent race conditions when multiple users interact with the dashboard simultaneously.
- Recommendation: Investigate whether Streamlit's session_state is thread-safe. If not, consider implementing locks or other concurrency control mechanisms to avoid race conditions.
No Authentication or Authorization
- The dashboard does not implement any authentication or authorization mechanisms. This could allow unauthorized users to access sensitive data or manipulate the dashboard.
- Recommendation: Implement authentication and authorization mechanisms, even for a demo. For example, use Streamlit's st_auth or integrate with an external authentication provider.

🟡 Warnings (Potential Breaking Changes)

Hardcoded Values
- The AGENTS, POLICIES, DECISIONS, and VIOLATION_CATEGORIES lists are hardcoded in simulator.py. If these values change in the future, the dashboard will require code changes.
- Recommendation: Externalize these configurations into a JSON or YAML file to make them easier to update without modifying the code.
Backward Compatibility
- The dashboard relies on a simulated data model that may not align with the actual data model used in production. If the production data model changes, the dashboard may break.
- Recommendation: Clearly document the data model and ensure that the simulated data aligns with the production data model. Consider adding tests to validate this alignment.

💡 Suggestions for Improvement

Unit Tests
- There are no unit tests for the new functionality.
- Recommendation: Add unit tests for the simulator.py functions and key components of the Streamlit app. Use pytest and mock the st.session_state object for testing.
Error Handling
- The code lacks error handling for potential issues, such as missing or malformed data in st.session_state.events.
- Recommendation: Add error handling to gracefully handle unexpected scenarios, such as missing data or invalid inputs.
Performance Optimization
- The append_events function appends new events to the st.session_state.events DataFrame and then truncates it to the last 1500 rows. This could become a bottleneck as the number of events grows.
- Recommendation: Consider using a more efficient data structure, such as a deque, for managing the event window.
Code Readability
- The _style_feed and _style_alerts functions use inline styles for DataFrame rendering. This makes the code harder to maintain and update.
- Recommendation: Move the styles to a separate CSS file or define them as constants at the top of the file for better maintainability.
Documentation
- While the README is comprehensive, it does not include information about the simulated data model or how to extend the dashboard for real-world use cases.
- Recommendation: Add a section to the README explaining the data model and how to adapt the dashboard for real-world data sources.
Dependency Management
- The requirements.txt file pins specific versions of dependencies, which is good for reproducibility. However, there is no lock file to ensure consistent dependency resolution.
- Recommendation: Generate a requirements.lock file using a tool like pip-tools to lock dependency versions and their transitive dependencies.
Security Headers
- Streamlit apps do not include security headers by default, which could make the app vulnerable to certain attacks.
- Recommendation: Use a reverse proxy (e.g., Nginx) to add security headers like Content-Security-Policy, X-Content-Type-Options, and X-Frame-Options.
Accessibility
- The dashboard uses custom CSS for styling, but there is no indication that accessibility considerations (e.g., ARIA roles, color contrast) have been taken into account.
- Recommendation: Test the dashboard for accessibility using tools like Lighthouse or axe. Ensure that color contrasts meet WCAG standards and that the app is navigable using a keyboard.
Logging
- The app does not include any logging for debugging or monitoring purposes.
- Recommendation: Add logging for key events, such as data generation, user interactions, and errors. Use a logging library like Python's built-in logging module.
Scalability
- The current implementation is designed for a single-user demo. It may not scale well for multiple concurrent users or large datasets.
- Recommendation: If this dashboard is intended for production use, consider using a more robust backend (e.g., FastAPI or Flask) and a database for data storage.

Final Assessment

The PR introduces a useful and visually appealing dashboard for real-time agent governance monitoring.
However, there are critical security issues (e.g., lack of input validation, potential sandbox escape, and missing authentication) that need to be addressed before deployment.
There are also some potential breaking changes due to hardcoded values and lack of alignment with the production data model.
Additional improvements in testing, error handling, and documentation are recommended.

Priority Actions

🔴 CRITICAL: Add input validation for simulated data in simulator.py.
🔴 CRITICAL: Ensure all user-controlled inputs and outputs are sanitized to prevent code injection or XSS.
🔴 CRITICAL: Investigate thread safety of st.session_state and implement concurrency controls if necessary.
🔴 CRITICAL: Add authentication and authorization mechanisms to restrict access to the dashboard.

Suggested Follow-Up

🟡 WARNING: Externalize hardcoded configurations (e.g., agents, policies) into a separate file.
💡 SUGGESTION: Add unit tests for the new functionality.
💡 SUGGESTION: Optimize the append_events function for better performance.
💡 SUGGESTION: Improve accessibility and add security headers.
💡 SUGGESTION: Add logging for better observability.

Let me know if you need further clarification or assistance!

github-actions · 2026-04-03T16:11:43Z

🤖 AI Agent: contributor-guide — Welcome! 👋

Welcome! 👋

Hi @first-time-contributor! Welcome to the microsoft/agent-governance-toolkit project! 🎉 Thank you for taking the time to contribute to our repository. We’re thrilled to have you here and appreciate your effort in submitting this pull request. Your contribution is a fantastic addition to the project, and we’re excited to review it with you.

What You Did Well ✅

Comprehensive Feature Set: The real-time governance dashboard is a fantastic addition to the project. The features you've implemented, such as the live policy feed, trust heatmap, and violation alerts, are well thought out and provide a lot of value for users.
Detailed Documentation: Your PR description and the included README.md file are thorough and well-structured. The step-by-step instructions for running the dashboard locally and via Docker are clear and easy to follow.
Code Organization: The code is well-structured, with clear separation of concerns. For example, the simulator.py file encapsulates the simulation logic nicely.
UI/UX Design: The use of Streamlit's layout features and custom CSS for styling the dashboard is impressive. The design looks clean and user-friendly.
Security Considerations: It's great to see that you've explicitly mentioned that no secrets or credentials are included and that the scope is limited to demo functionality.

Suggestions for Improvement ✍️

Linting with Ruff:
- Our project uses Ruff for linting with rules E, F, and W. Please run Ruff on your code and address any linting issues. You can install Ruff using pip install ruff and run it with ruff check ..
Tests:
- While this is a demo feature, we still encourage adding tests to ensure the functionality works as expected. Tests should be placed in packages/{name}/tests/. For example, you could add tests for the simulator.py logic to validate the event generation and trust score calculations. This will help maintain the reliability of the feature as the project evolves.
Conventional Commits:
- Thank you for using the feat: prefix in your commit message! However, it would be great if you could make the message more concise and aligned with the Conventional Commits standard. For example:
```
feat(dashboard): add real-time governance dashboard with live policy monitoring
```
Security-Sensitive Code:
- While you've noted that this is a demo and doesn't include sensitive data, it's still good practice to add comments or documentation to highlight any potential security considerations. For example, you could add a note in the simulator.py file to clarify that the trust score and event data are simulated and not derived from real-world systems.
Screenshots:
- The screenshots in the README.md are a great idea! However, it seems like the image links are not rendering correctly. You might need to adjust the paths to ensure they work when viewed on GitHub. For example:
```
![Dashboard Overview](screenshots/dashboard-overview.jpeg)
```
  should work if the screenshots folder is in the same directory as the README.md.

Project Conventions 📚

Here are a few conventions we follow in this project:

Linting: We use Ruff for linting with rules E, F, and W. Please ensure your code passes Ruff checks.
Testing: All tests should be placed in the packages/{name}/tests/ directory. This helps us maintain a consistent structure across the repository.
Commit Messages: We follow the Conventional Commits standard. Prefix your commit messages with feat:, fix:, docs:, etc., to indicate the type of change.
Security: Any code that handles sensitive data or security-critical functionality should be reviewed with extra care. Please include comments or documentation to highlight any potential security implications.

You can find more details in our CONTRIBUTING.md and QUICKSTART.md files.

Next Steps 🚦

Run Ruff: Address any linting issues reported by Ruff.
Add Tests: If possible, add tests for the new functionality in packages/{name}/tests/.
Update Commit Message: Ensure your commit message follows the Conventional Commits standard.
Fix README Image Links: Verify that the screenshot links in the README.md render correctly on GitHub.
Respond to Feedback: Let us know if you have any questions or need help with the requested changes.

Once you've made the updates, push the changes to your branch, and this pull request will automatically update. We'll review your changes and provide further feedback if needed.

Thank you again for your contribution! We're here to help if you have any questions. 😊

github-actions · 2026-04-03T16:11:44Z

🤖 AI Agent: security-scanner — Security Review of PR: Real-Time Agent Governance Dashboard

Security Review of PR: Real-Time Agent Governance Dashboard

This PR introduces a real-time governance dashboard demo for visualizing agent interactions, policy decisions, trust relationships, and violations. Below is a security analysis of the changes made in this PR, categorized by potential vulnerabilities.

1. Prompt Injection Defense Bypass

Risk: The _sanitize_json_payload function uses bleach.clean and html.escape to sanitize JSON payloads. While this is a good practice, it may not fully mitigate all potential injection attacks, especially if the sanitized data is used in contexts other than HTML (e.g., database queries or command execution).
Severity: 🟡 MEDIUM
Attack Vector: If the sanitized data is used in a context where escaping HTML is insufficient (e.g., SQL queries or shell commands), it could lead to injection vulnerabilities.
Recommendation: Ensure that the sanitized data is only used in HTML contexts. For other contexts (e.g., SQL, shell commands), use context-specific sanitization libraries or parameterized queries.

2. Policy Engine Circumvention

Risk: The dashboard is a demo application and does not directly enforce policies. However, the simulator.py file includes a prompt_injection_guard policy in the POLICIES list. The implementation of this policy is not provided in the PR, and it is unclear how this policy is simulated or enforced.
Severity: 🟠 HIGH
Attack Vector: If the prompt_injection_guard policy is not properly implemented or simulated, it could give a false sense of security to users testing the dashboard.
Recommendation: Ensure that the prompt_injection_guard policy is implemented and tested thoroughly in the simulator. Provide documentation on how this policy is simulated and its limitations in the demo environment.

3. Trust Chain Weaknesses

Risk: The dashboard visualizes trust relationships between agents but does not validate the authenticity of these relationships. This is acceptable for a demo but could lead to incorrect assumptions about the security of the system.
Severity: 🟡 MEDIUM
Attack Vector: If users assume the trust relationships are validated in the demo, they might overlook potential trust chain weaknesses in their production systems.
Recommendation: Clearly document that the trust relationships in the demo are simulated and not validated. Consider adding a disclaimer in the UI.

4. Credential Exposure

Risk: The PR explicitly states that no secrets or credentials are included, and the scope is limited to demo functionality.
Severity: 🔵 LOW
Attack Vector: None identified.
Recommendation: Ensure that no sensitive data is logged or exposed in the future. Regularly audit the codebase for accidental inclusion of secrets.

5. Sandbox Escape

Risk: The dashboard runs in a Docker container with a non-root user (appuser), which is a good practice. However, the container is not explicitly configured with additional security measures like seccomp, AppArmor, or SELinux profiles.
Severity: 🟡 MEDIUM
Attack Vector: If an attacker exploits a vulnerability in the application, they could potentially escape the container and compromise the host system.
Recommendation: Add security options to the Docker configuration, such as enabling --security-opt seccomp=unconfined or using a minimal base image like distroless.

6. Deserialization Attacks

Risk: The application uses pandas to handle data, which is generally safe for structured data. However, there is no indication of untrusted data being deserialized in this PR.
Severity: 🔵 LOW
Attack Vector: None identified.
Recommendation: Ensure that any future deserialization of untrusted data is done using safe libraries and methods.

7. Race Conditions

Risk: The _STATE_LOCK in simulator.py is used to synchronize access to shared state. This is a good practice, but the implementation should be reviewed for potential Time-of-Check to Time-of-Use (TOCTOU) vulnerabilities.
Severity: 🟡 MEDIUM
Attack Vector: If the lock is not used consistently, it could lead to race conditions in the simulated event pipeline.
Recommendation: Audit the usage of _STATE_LOCK to ensure it is consistently applied wherever shared state is accessed or modified.

8. Supply Chain

Risk: The requirements.txt file includes pinned dependencies, which is a good practice. However, the dependencies should be checked for known vulnerabilities.
Severity: 🟠 HIGH
Attack Vector: If any of the dependencies have known vulnerabilities, they could be exploited to compromise the application.
Recommendation: Use a dependency scanning tool (e.g., pip-audit, safety) to check for vulnerabilities in the dependencies. Regularly update dependencies to their latest secure versions.

Additional Observations

Logging: The application logs warnings for invalid inputs, which is a good practice. However, ensure that logs do not contain sensitive information.
UI Security: The use of bleach and html.escape for sanitizing user inputs is a good practice for preventing XSS attacks in the dashboard.

Summary of Findings

Category	Severity	Description
Prompt Injection Defense	🟡 MEDIUM	Potential for insufficient sanitization in non-HTML contexts.
Policy Engine Circumvention	🟠 HIGH	Lack of clarity on the implementation of the `prompt_injection_guard` policy.
Trust Chain Weaknesses	🟡 MEDIUM	Simulated trust relationships are not validated.
Credential Exposure	🔵 LOW	No secrets or credentials included in the PR.
Sandbox Escape	🟡 MEDIUM	Docker container lacks explicit security configurations.
Deserialization Attacks	🔵 LOW	No unsafe deserialization identified.
Race Conditions	🟡 MEDIUM	Potential for TOCTOU vulnerabilities in shared state access.
Supply Chain	🟠 HIGH	Dependencies should be scanned for known vulnerabilities.

Recommendations

Prompt Injection Defense: Ensure context-specific sanitization for all user inputs.
Policy Engine: Implement and document the prompt_injection_guard policy in the simulator.
Trust Chain: Add a disclaimer in the UI about the simulated nature of trust relationships.
Sandboxing: Enhance Docker security by adding seccomp, AppArmor, or SELinux profiles.
Dependencies: Perform a security audit of the dependencies listed in requirements.txt and update them as needed.
Race Conditions: Review the usage of _STATE_LOCK to ensure consistent application.

This PR introduces a useful demo tool but requires additional attention to ensure it does not inadvertently introduce security risks or mislead users about its capabilities.

Copilot

Pull request overview

Adds a self-contained Streamlit demo dashboard under demo/governance-dashboard/ to visualize simulated agent governance activity (policy decisions, trust posture, and violations) with a live-refresh UI.

Changes:

Introduces a Streamlit app with live feed, coverage chart, trust heatmap, violation drill-down, and activity timeline.
Adds an in-app simulator that generates governance events and maintains a session-scoped trust map.
Provides Docker Compose packaging plus pinned Python dependencies and screenshots.

Reviewed changes

Copilot reviewed 6 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
demo/governance-dashboard/app.py	Streamlit dashboard UI, filtering, charts, and live-refresh loop.
demo/governance-dashboard/simulator.py	Session-state simulator that seeds and appends governance events + trust drift.
demo/governance-dashboard/requirements.txt	Pinned dependencies for the demo.
demo/governance-dashboard/README.md	Run instructions and screenshot links for the demo.
demo/governance-dashboard/Dockerfile	Container build for running the Streamlit app.
demo/governance-dashboard/docker-compose.yml	One-command container launch configuration.
demo/governance-dashboard/screenshots/dashboard-overview.jpeg	Screenshot asset for documentation.
demo/governance-dashboard/screenshots/violation_drilldown.jpeg	Screenshot asset for documentation.
demo/governance-dashboard/screenshots/trustscore_heatmap.jpeg	Screenshot asset for documentation.
demo/governance-dashboard/screenshots/.gitkeep	Keeps screenshots directory tracked.

Copilot · 2026-04-03T16:13:46Z

demo/governance-dashboard/app.py

+st.markdown(
+    """
+    <style>
+    @import url('https://fonts.googleapis.com/css2?family=Space+Grotesk:wght@400;500;700&family=IBM+Plex+Mono:wght@400;500&display=swap');
+


The embedded CSS imports Google Fonts from fonts.googleapis.com, which makes the dashboard depend on an external network request at runtime. This contradicts the PR/README claim that the demo needs no external services, and it will also break in offline / restricted environments. Consider removing the @import and using system fonts, or bundling fonts locally within the repo/image.

Copilot · 2026-04-03T16:13:46Z

demo/governance-dashboard/README.md

+```markdown
+![Dashboard Overview](screenshots/dashboard-overview.jpeg)
+![Violation Drill-down](screenshots/violation-drilldown.jpeg)
+![Trust Heatmap](screenshots/trust-heatmap.jpeg)
+```


The screenshot links are inside a fenced ```markdown code block, so they will render as code instead of images. Also, two referenced filenames don't match the actual files in screenshots/ (violation_drilldown.jpeg, trustscore_heatmap.jpeg). Remove the code fence and update the image paths (or rename the files) so the screenshots render correctly.

Suggested change

```markdown

![Dashboard Overview](screenshots/dashboard-overview.jpeg)

![Violation Drill-down](screenshots/violation-drilldown.jpeg)

![Trust Heatmap](screenshots/trust-heatmap.jpeg)

```

![Dashboard Overview](screenshots/dashboard-overview.jpeg)

![Violation Drill-down](screenshots/violation_drilldown.jpeg)

![Trust Heatmap](screenshots/trustscore_heatmap.jpeg)

vnscka · 2026-04-03T16:26:13Z

@microsoft-github-policy-service agree

imran-siddique

Clean demo. Approving.

imran-siddique

Thanks for the dashboard demo! A few items need addressing per our contribution guidelines:

Blocking:

Missing tests — CONTRIBUTING.md requires tests for all new features. \simulator.py\ has pure functions (_build_event, \�ppend_events, _compute_trust_drift) that are straightforward to unit test.

Should fix:

Broken screenshot links in README (images inside code fence, filename mismatch \ rust-heatmap.jpeg\ vs \ rustscore_heatmap.jpeg)
Docker container runs as root — add a non-root USER directive
Missing docstrings on public functions (per contribution guidelines)
\datetime.utcnow()\ is deprecated since Python 3.12 — use \datetime.now(tz=timezone.utc)\

See the automated code review for full details.

github-actions

🤖 AI Agent: code-reviewer

Review Summary

This pull request introduces a real-time governance dashboard demo for visualizing agent interactions, policy decisions, trust relationships, and violations using a simulated event pipeline. The dashboard is implemented using Streamlit and includes features such as live updates, filtering, and visualizations like heatmaps and timelines. While the implementation is functional and well-documented, there are several areas for improvement, particularly in terms of security, type safety, and potential breaking changes.

🔴 CRITICAL

Lack of Input Validation for User Inputs
- Issue: The Streamlit app takes user inputs (e.g., filters, refresh intervals) without robust validation. For example:
  - selected_agents, selected_decisions, and selected_policies are directly used in filtering without validation.
  - refresh_seconds and events_per_tick are used directly in logic without bounds checking.
- Risk: This could lead to unexpected behavior or even denial-of-service (DoS) attacks if malicious inputs are provided.
- Recommendation: Add stricter validation for all user inputs. For example:
```
if refresh_seconds < 1 or refresh_seconds > 10:
    st.error("Invalid refresh interval. Please select a value between 1 and 10 seconds.")
```
Trust Map Manipulation
- Issue: The trust_map is stored in st.session_state and updated dynamically. However, there is no locking mechanism to ensure thread safety.
- Risk: In a concurrent environment, this could lead to race conditions and data corruption.
- Recommendation: Use a thread-safe mechanism (e.g., threading.Lock) to ensure atomic updates to the trust_map.
Potential for Sandbox Escape
- Issue: The details field in the violation drill-down is HTML-escaped but not sanitized. If the details field contains malicious JavaScript, it could lead to XSS attacks.
- Risk: This could allow attackers to execute arbitrary JavaScript in the user's browser.
- Recommendation: Use a library like bleach to sanitize HTML content before rendering it in the dashboard.

🟡 WARNING

Backward Compatibility
- Issue: The new demo/governance-dashboard directory introduces new functionality but does not integrate with the existing packages in the packages/ directory. This could lead to confusion for users who expect the demo to interact with the core library.
- Risk: Users may assume the demo is fully integrated with the main library, leading to potential misuse or misinterpretation of the demo's capabilities.
- Recommendation: Clearly document that this is a standalone demo and does not interact with the core library. Alternatively, consider integrating the demo with the core library for consistency.

💡 SUGGESTIONS

Type Annotations
- Observation: While some functions have type annotations, many do not (e.g., _build_event, _validate_event_payload).
- Recommendation: Add type annotations to all functions to improve code readability and maintainability. For example:
```
def _build_event(ts: datetime) -> dict[str, Any]:
```
Unit Tests
- Observation: There are no unit tests provided for the new functionality, especially for critical components like the simulator.py and the Streamlit app logic.
- Recommendation: Add unit tests for:
  - The _validate_event_payload function to ensure it correctly sanitizes and validates input.
  - The _build_event function to verify the correctness of generated events.
  - The Streamlit app logic, using tools like pytest and pytest-streamlit.
Dependency Management
- Observation: The requirements.txt file includes pinned versions for dependencies, which is good for reproducibility. However, there is no mechanism to ensure these dependencies are up-to-date.
- Recommendation: Use a tool like pip-tools or dependabot to automate dependency updates and ensure the latest security patches are applied.
Logging
- Observation: The simulator and dashboard lack logging for critical operations (e.g., event generation, trust map updates).
- Recommendation: Add logging to critical sections of the code to aid in debugging and monitoring. For example:
```
import logging
logger = logging.getLogger(__name__)
logger.info("Generated event: %s", event)
```
Documentation
- Observation: While the README is detailed, it does not include information about the architecture or how the simulated data is generated.
- Recommendation: Add a section to the README explaining the architecture and the logic behind the event simulation.
Security Headers
- Observation: The Streamlit app does not include any security headers.
- Recommendation: Configure Streamlit to include security headers such as Content-Security-Policy and X-Content-Type-Options to mitigate potential attacks.
Error Handling
- Observation: There is minimal error handling in the simulator and dashboard code.
- Recommendation: Add error handling for potential issues, such as invalid data in the st.session_state or unexpected exceptions during event generation.
Performance Optimization
- Observation: The append_events function adds new events to the session state but does not limit the size of the event log.
- Recommendation: Implement a rolling window mechanism to limit the size of the event log and prevent memory bloat.

Final Assessment

The PR introduces a valuable feature for visualizing agent governance in real time. However, there are critical security and thread safety issues that must be addressed before merging. Additionally, the lack of unit tests and type annotations makes the codebase harder to maintain and verify. Addressing these issues will significantly improve the quality and reliability of the implementation.

github-actions

🤖 AI Agent: code-reviewer

Review Summary

This PR introduces a new real-time governance dashboard demo using Streamlit. While the implementation is functional and visually appealing, there are several areas that require attention to ensure security, correctness, and maintainability. Below is a detailed review of the PR.

🔴 CRITICAL

Potential XSS Vulnerability in JSON Drill-Down
- The safe_details field in the drill-down JSON view is sanitized using html.escape, but this is not sufficient to prevent all XSS attacks. Attackers could inject malicious scripts into other fields (e.g., agent_source, policy_name, etc.) that are displayed in the JSON output without proper sanitization.
- Recommendation: Use a library like bleach to sanitize all user-generated or simulated input before rendering it in the dashboard.
Lack of Input Validation for User-Provided Filters
- The filters for selected_agents, selected_decisions, and selected_policies are validated in the sidebar, but this validation is not robust. If the session state is tampered with (e.g., via browser dev tools), invalid values could bypass the checks.
- Recommendation: Add server-side validation for all user-provided inputs to ensure they match the expected values.
Trust Map Manipulation
- The trust_map is stored in st.session_state and updated dynamically. However, there is no mechanism to prevent malicious actors from tampering with this state.
- Recommendation: Use a more secure mechanism to store and manage sensitive state, such as a backend service or a secure database. Avoid relying solely on client-side session state for critical data.
Hardcoded Decision Weights
- The DECISION_WEIGHTS array is hardcoded in the simulator, which could lead to unintended behavior if the weights are not properly configured.
- Recommendation: Move the weights to a configuration file or environment variable and validate them during initialization.

🟡 WARNING

Backward Compatibility
- The PR introduces a new demo application in the demo/governance-dashboard directory. While this does not directly impact the existing packages under packages/, any future integration with the main library could introduce breaking changes.
- Recommendation: Clearly document that this is a standalone demo and does not affect the main library. If integration is planned, ensure backward compatibility is maintained.
Dependency Pinning
- The requirements.txt file pins specific versions of dependencies (e.g., streamlit==1.40.2, pandas==2.2.3). While this ensures stability for the demo, it may cause compatibility issues with the main library if it uses different versions of the same dependencies.
- Recommendation: Use a dependency management tool like poetry or pip-tools to manage dependencies and ensure compatibility across the monorepo.

💡 SUGGESTIONS

Thread Safety
- The use of st.session_state for storing the trust_map and other stateful data may lead to race conditions in a multi-user environment.
- Recommendation: Investigate whether Streamlit's session state is thread-safe. If not, consider using a thread-safe data structure or a backend service for state management.
Error Handling
- The _validate_event_payload function performs basic validation but does not log or handle invalid inputs beyond defaulting to fallback values.
- Recommendation: Add logging for invalid inputs to aid in debugging and monitoring.
Test Coverage
- The PR mentions that the demo was tested locally, but there are no automated tests included for the dashboard or simulator.
- Recommendation: Add unit tests for the simulator.py module and integration tests for the app.py dashboard. Use pytest and pytest-mock for mocking Streamlit components.
Security Headers
- The Streamlit app does not include any HTTP security headers, which could make it vulnerable to attacks.
- Recommendation: Use a reverse proxy (e.g., Nginx) to add security headers like Content-Security-Policy, X-Content-Type-Options, and Strict-Transport-Security.
Documentation
- The README is well-written but could include a section on the architecture of the simulator and how it generates events.
- Recommendation: Add a detailed explanation of the simulation logic, including how trust scores and decisions are generated.
Scalability
- The dashboard uses in-memory data storage (st.session_state) for events and the trust map. This approach may not scale well for larger datasets or multiple concurrent users.
- Recommendation: Consider integrating a lightweight database (e.g., SQLite) or an in-memory data store (e.g., Redis) for better scalability.
Code Style
- The _style_feed and _style_alerts functions use inline styles, which can be hard to maintain.
- Recommendation: Move these styles to a centralized CSS file or a configuration object for better maintainability.
Hardcoded Values
- Several hardcoded values (e.g., 8501 for the port, 80 for seed_events) are scattered throughout the code.
- Recommendation: Move these values to a configuration file or environment variables.

Conclusion

The PR introduces a valuable demo for visualizing agent governance, but there are several critical security issues and areas for improvement. Addressing these issues will enhance the security, maintainability, and scalability of the dashboard.

github-actions

🤖 AI Agent: code-reviewer

Pull Request Review for `feat(demo): add real-time agent governance dashboard with live policy monitoring`

Summary

This PR introduces a real-time governance dashboard demo using Streamlit. The dashboard visualizes agent interactions, policy decisions, trust relationships, and violations using a simulated event pipeline. It includes multiple features like a live policy feed, trust heatmap, violation alerts, and an agent activity timeline.

🔴 CRITICAL Issues

Lack of Input Validation for User-Provided Data
- Issue: The _sanitize_json_payload function uses bleach.clean and html.escape to sanitize user-provided input. While this is a good start, it does not ensure complete protection against all potential injection attacks, especially if the data is used in other contexts (e.g., database queries, file writes).
- Impact: This could lead to injection vulnerabilities if the sanitized data is used in unsafe ways.
- Recommendation: Use stricter validation and sanitization tailored to the specific context where the data will be used. For example:
  - For JSON payloads, validate the schema using libraries like Pydantic.
  - For database queries, use parameterized queries instead of string interpolation.
  - For HTML rendering, ensure that all user-provided data is sanitized for XSS.
Potential for Denial of Service (DoS)
- Issue: The append_events function in simulator.py generates a configurable number of events per tick. If a user sets an excessively high value for events_per_tick, it could lead to performance degradation or even crash the application.
- Impact: This could be exploited to cause a denial-of-service attack on the dashboard.
- Recommendation: Add a hard limit to the events_per_tick slider in app.py (e.g., max_value=10) and validate the value in the backend to prevent abuse.
Thread Safety Concerns
- Issue: The st.session_state object is used to store and update the event data. However, Streamlit's session_state is not thread-safe, and concurrent access to it could lead to race conditions.
- Impact: This could result in data corruption or unexpected behavior when multiple users interact with the dashboard simultaneously.
- Recommendation: Use a thread-safe data structure or a database to store and manage shared state. For example, consider using a lightweight in-memory database like SQLite or Redis.

🟡 Warnings

Backward Compatibility
- Issue: The dashboard is a standalone demo and does not integrate with the existing packages under packages/. However, if this is intended to be integrated into the main library in the future, it may introduce breaking changes.
- Impact: Future integration could require significant refactoring of the existing codebase.
- Recommendation: Clearly document the scope and purpose of this demo in the repository's README to avoid confusion about its relationship with the main library.
Hardcoded Defaults
- Issue: The DEFAULT_DECISION_WEIGHTS and other constants are hardcoded in simulator.py. While these values can be overridden via environment variables, there is no clear documentation about this feature.
- Impact: Users may not be aware of how to customize the simulation behavior.
- Recommendation: Document the environment variables and their expected format in the README.md file.

💡 Suggestions

Type Annotations
- Observation: While some functions have type annotations, others (e.g., _clamp_trust_score) do not.
- Recommendation: Add type annotations to all functions to improve code readability and maintainability. This will also help with static analysis and type checking.
Unit Tests
- Observation: The PR does not include any unit tests for the new functionality.
- Recommendation: Add unit tests for critical functions, especially those related to data validation (_sanitize_json_payload, _load_decision_weights, etc.) and the simulation logic.
Error Handling
- Observation: The _load_decision_weights function has basic error handling but could be more robust.
- Recommendation: Log errors when invalid weights are encountered, and provide feedback to the user if possible.
Security Headers
- Observation: The Streamlit app does not set any security headers.
- Recommendation: Use a reverse proxy (e.g., Nginx) to add security headers like Content-Security-Policy, X-Content-Type-Options, and X-Frame-Options.
Dependency Management
- Observation: The requirements.txt file pins specific versions of dependencies, which is good for reproducibility. However, there is no mechanism to check for vulnerabilities in these dependencies.
- Recommendation: Integrate a dependency scanning tool like pip-audit or safety into the CI/CD pipeline.
Empty States
- Observation: The dashboard handles empty states gracefully, but the messages could be more descriptive.
- Recommendation: Provide additional context in the empty state messages. For example, "No violations detected for the selected filters. Adjust the filters to view more data."

Final Assessment

Overall Quality: The code is well-structured and adheres to good practices in terms of readability and modularity. The use of Streamlit for rapid prototyping is appropriate for a demo.
Security: There are critical issues related to input validation, potential DoS, and thread safety that need to be addressed before merging.
Testing: The PR lacks unit tests, which are essential for ensuring the correctness of the simulation logic and data validation.
Documentation: The documentation is comprehensive but could benefit from additional details about environment variables and empty state handling.

Recommended Actions Before Merging

Address the critical issues related to input validation, DoS, and thread safety.
Add unit tests for key functions and simulation logic.
Document the environment variables and their usage in the README.md file.
Consider integrating a dependency scanning tool into the CI/CD pipeline.

Once these issues are addressed, the PR will be ready for merging. Let me know if you need further clarification or assistance!

github-actions

🤖 AI Agent: code-reviewer

Review of PR: `feat(demo): add real-time agent governance dashboard with live policy monitoring`

🔴 CRITICAL

Lack of Input Validation for User Inputs:
- The _coerce_bounded_int function is used to validate numerical inputs, but it does not log or raise an error when invalid inputs are provided. This could lead to silent failures or unexpected behavior.
- The _validated_selection function does not sanitize inputs beyond checking if they are in the allowed list. This could lead to potential injection attacks if the allowed list is compromised or if the input is manipulated before validation.
Recommendation:
- Add logging for invalid inputs in _coerce_bounded_int and _validated_selection.
- Use stricter validation mechanisms for user inputs, especially when they are used in filtering or database queries.
Potential Cross-Site Scripting (XSS) Vulnerability:
- The _sanitize_json_payload function uses bleach.clean and html.escape, which is good. However, there is no guarantee that all user-provided inputs are sanitized before being displayed in the Streamlit app.
- The st.markdown function is used with unsafe_allow_html=True in multiple places. If any user-provided input is directly passed to these functions without proper sanitization, it could lead to XSS attacks.
Recommendation:
- Audit all instances of st.markdown and ensure no user-provided input is passed without sanitization.
- Consider disabling unsafe_allow_html where possible or use a stricter whitelist of allowed HTML tags and attributes.
Thread Safety Concerns:
- The _STATE_LOCK is used to protect shared state, but its usage is not consistent across the codebase. For example, the append_events function modifies the shared state but does not use the lock.
- The initialize_state function also modifies st.session_state without acquiring the lock.
Recommendation:
- Ensure that all modifications to shared state (st.session_state) are wrapped in _STATE_LOCK to prevent race conditions in concurrent execution scenarios.
Hardcoded Decision Weights:
- The DEFAULT_DECISION_WEIGHTS are hardcoded and fallback to default values if the AGD_DECISION_WEIGHTS environment variable is invalid. This could lead to unexpected behavior if the weights are not properly configured.
Recommendation:
- Add stricter validation for AGD_DECISION_WEIGHTS and fail fast if the configuration is invalid. This ensures that the system does not operate with unintended defaults.

🟡 WARNING

Backward Compatibility:
- The PR introduces a new feature (governance-dashboard) in the demo directory, which is isolated from the main packages/ directory. However, it is unclear if this feature interacts with the existing packages or APIs.
- If the dashboard is intended to integrate with the main library in the future, ensure that its introduction does not break existing APIs or functionality.
Recommendation:
- Clearly document the scope of this demo and its relationship with the main library. If it is not intended to interact with the main library, explicitly state this in the documentation.

💡 SUGGESTIONS

Dependency Management:
- The requirements.txt file pins specific versions of dependencies. While this ensures reproducibility, it may lead to dependency conflicts with other parts of the monorepo.
- For example, pandas==2.2.3 and numpy==2.1.2 are pinned, but these versions may conflict with other packages in the monorepo.
Recommendation:
- Use a dependency management tool like poetry or pip-tools to manage dependencies across the monorepo.
- Regularly update dependencies to address security vulnerabilities and ensure compatibility.
Testing:
- The PR mentions that the dashboard was tested locally using Docker and Streamlit, but there are no automated tests included for the new functionality.
- The simulator.py file contains logic for generating synthetic data, but there are no unit tests to validate its correctness.
Recommendation:
- Add unit tests for the simulator.py module to ensure the correctness of the simulation logic.
- Consider adding integration tests for the Streamlit dashboard using tools like pytest-streamlit or selenium.
Documentation:
- The documentation is well-written and provides clear instructions for running the dashboard. However, it does not provide details on how the simulated data is generated or how the decision weights affect the simulation.
Recommendation:
- Expand the documentation to include details on the simulation logic and how to configure the decision weights.
Code Quality:
- The _style_feed and _style_alerts functions use inline styles for styling DataFrame rows. This approach is not scalable and makes it difficult to maintain the code.
Recommendation:
- Consider using a CSS file or a centralized styling mechanism for the Streamlit app to improve maintainability.
Error Handling:
- The _load_decision_weights function logs a warning when the AGD_DECISION_WEIGHTS environment variable is invalid but does not provide detailed information about the error.
Recommendation:
- Include the exception details in the log message to aid debugging.
Performance Optimization:
- The filtered DataFrame is created by applying multiple filters to the main DataFrame. This could be optimized for performance, especially for large datasets.
Recommendation:
- Profile the filtering logic and optimize it if necessary. Consider using vectorized operations or caching intermediate results.
Security of Docker Image:
- The Dockerfile uses the python:3.11.11-slim base image, which is a good choice for a lightweight image. However, there is no explicit step to remove unnecessary files or dependencies after the build process.
Recommendation:
- Add a step to clean up unnecessary files and dependencies in the Dockerfile to reduce the image size and attack surface.

Summary

The PR introduces a well-designed and feature-rich governance dashboard for real-time policy monitoring. However, there are critical security issues related to input validation, XSS vulnerabilities, and thread safety that need to be addressed. Additionally, there are opportunities for improvement in dependency management, testing, documentation, code quality, error handling, and Docker image security.

Action Items:

Address the critical security issues related to input validation, XSS vulnerabilities, and thread safety.
Add unit and integration tests for the new functionality.
Improve documentation to include details on simulation logic and decision weights.
Optimize the filtering logic for better performance.
Enhance the Dockerfile to reduce the image size and improve security.

Let me know if you need further clarification or assistance!

github-actions

🤖 AI Agent: code-reviewer

Review Summary

This PR introduces a real-time governance dashboard for visualizing agent interactions, policy decisions, trust relationships, and violations using a simulated event pipeline. While the dashboard is a demo and does not directly interact with the core library, it is still essential to ensure that it adheres to best practices for security, correctness, and maintainability.

🔴 CRITICAL

Potential XSS Vulnerability in JSON Drill-Down
- The _sanitize_json_payload function uses bleach.clean and html.escape to sanitize JSON payloads for the drill-down feature. However, this approach may not be sufficient to prevent all XSS attacks, especially if the payload contains unexpected or malformed data.
- Recommendation: Use a stricter JSON serialization library that guarantees safety, such as json.dumps with ensure_ascii=True. Avoid relying solely on bleach for sanitization.
Lack of Input Validation for Environment Variables
- The AGD_DECISION_WEIGHTS environment variable is parsed without sufficient validation. If invalid or malicious input is provided, it could lead to unexpected behavior or crashes.
- Recommendation: Add robust validation for AGD_DECISION_WEIGHTS to ensure it is a comma-separated list of floats that sum to 1.0. Log an error and exit gracefully if the validation fails.
Thread Safety Concerns with _STATE_LOCK
- The _STATE_LOCK is used to synchronize access to shared state, but its usage is not consistent across the codebase. For example, the append_events function does not use the lock when modifying st.session_state.events.
- Recommendation: Ensure that all access to shared state (e.g., st.session_state) is properly synchronized using _STATE_LOCK to avoid race conditions in multi-threaded environments.
Improper Handling of User Input
- The _validated_selection function logs warnings for invalid inputs but does not raise errors or provide feedback to the user. This could lead to silent failures.
- Recommendation: Provide user feedback (e.g., a Streamlit warning message) when invalid inputs are detected, instead of just logging warnings.

🟡 WARNING

Backward Compatibility
- While this PR does not directly modify the core library under packages/, it introduces a new demo application. If users expect the demo to work seamlessly with the library, any future changes to the library's API could break the demo.
- Recommendation: Clearly document the version of the library that this demo is compatible with. Consider adding tests to ensure compatibility with future versions of the library.
Hardcoded Defaults
- The DEFAULT_DECISION_WEIGHTS are hardcoded in the simulator.py file. If these values need to be updated, it would require a code change.
- Recommendation: Move these defaults to a configuration file or environment variable to make them easier to update without modifying the code.

💡 SUGGESTIONS

Unit Tests for Simulator
- The simulator.py file contains critical logic for generating simulated governance events. However, there are no unit tests provided for this module.
- Recommendation: Add unit tests for the SimulationConfig class and the append_events function to ensure the correctness of the simulation logic.
Type Annotations
- While some functions have type annotations, others (e.g., append_events, initialize_state) do not.
- Recommendation: Add type annotations to all functions to improve code readability and maintainability.
Logging Levels
- The logging statements in the code use the warning level for issues that may not be critical (e.g., invalid user input). This could lead to log noise.
- Recommendation: Use appropriate logging levels (e.g., info or debug) for non-critical issues.
Error Handling for External Dependencies
- The dashboard relies on external libraries like streamlit, pandas, and plotly. If these libraries are not installed or are incompatible, the application will fail to run.
- Recommendation: Add a try-except block during imports to catch ImportError and provide a user-friendly error message.
Documentation
- The README is well-written but could benefit from additional details about the purpose and scope of the demo.
- Recommendation: Add a section explaining how this demo fits into the overall microsoft/agent-governance-toolkit project and its intended use cases.
Code Duplication
- There is some code duplication in the _style_feed and _style_alerts functions.
- Recommendation: Refactor these functions to reduce duplication and improve maintainability.
Use of st.session_state
- The st.session_state is used to store and manage runtime state. However, this approach may not scale well for more complex applications.
- Recommendation: Consider abstracting the state management logic into a separate class or module for better scalability and testability.
Security Headers in Streamlit
- Streamlit does not enforce security headers like Content-Security-Policy (CSP) by default, which could leave the application vulnerable to certain attacks.
- Recommendation: Configure Streamlit to include security headers, especially if this demo is ever exposed to the public.
Graceful Handling of Empty Data
- While the application handles empty states gracefully in most places, some visualizations (e.g., the heatmap) may still attempt to render with empty data, leading to errors.
- Recommendation: Add checks to ensure that visualizations are only rendered when there is sufficient data.

Final Assessment

The PR introduces a valuable demo for visualizing agent governance, but there are critical security concerns and areas for improvement. Addressing these issues will ensure the demo is robust, secure, and maintainable.

Critical Issues: 4
Warnings: 2
Suggestions: 9

Please address the critical issues and consider the warnings and suggestions before merging this PR. Let me know if you need further clarification or assistance!

vnscka · 2026-04-06T04:41:37Z

Thanks @imran-siddique for the requested changes. I’ve implemented the full set and revalidated locally all requested updates with 9 passing tests. Added simulator unit tests, strengthened validation/security (input bounds, sanitization, logging), enforced event limits, and improved state safety with locking. Introduced env-configurable decision weights with strict mode, updated datetime usage, added docstrings, and switched Docker to non-root. Also updated README with fixes and clearer documentation. Please check and review once.

imran-siddique · 2026-04-06T20:03:06Z

Hi @vnscka — thanks for the dashboard demo! We left a review with a few items to address (mainly adding unit tests for the simulator functions, fixing screenshot links, and adding a non-root Docker USER). Let us know if you need any guidance on the test patterns — happy to help. Looking forward to getting this merged!

imran-siddique · 2026-04-08T23:32:44Z

Review: ESCALATED — Large Feature PR 🚨

Diff reviewed — XL PR (1017 additions, 11 files). Implements real-time governance dashboard with Streamlit, simulated event pipeline, trust heatmap, violation alerts, and agent timeline. Closes #723.

Per repo policy: large feature PRs require maintainer review before merge. This PR is already assigned for review.

Initial observations from diff scan:

Scope matches description — dashboard demo files
Uses streamlit — known/registered package
Dependencies pinned in requirements.txt — good
No hardcoded secrets detected
No external service dependencies (uses simulated data)

Full design review needed before merge.

imran-siddique · 2026-04-09T16:31:58Z

Thank you for this dashboard demo, @vnscka — the real-time event streaming, trust heatmap, and violation alert system are exactly the kind of visualization governance needs.

After reviewing both demo PRs (#805 and #750), we've decided to build a unified governance dashboard that combines the best elements of both contributions: your real-time streaming and heatmap approach with #805's DID verification and Merkle audit trails.

Closing this PR to start fresh with a consolidated demo. Your design patterns — especially the sidebar controls, policy coverage legend, and graceful empty states — will carry forward. Thank you for the contribution!

vnscka · 2026-04-09T18:43:31Z

Thanks for the detailed review and for considering this contribution @imran-siddique.

I completely understand the decision to consolidate both PRs into a unified dashboard; that makes sense from a product and maintenance perspective. I’m glad to hear that parts of this implementation (especially the real-time streaming, heatmap, and UX elements like filters and empty states) will be carried forward.

If helpful, I’d be happy to:

Contribute to the new consolidated PR
Help integrate the event simulation or visualization components
Refactor parts of this work to align with the new architecture

Appreciate the feedback throughout the review process, it was very useful in improving the implementation.

Looking forward to collaborating on the unified version!

feat(demo): add real-time agent governance dashboard with Streamlit

534dbb6

Copilot AI review requested due to automatic review settings April 3, 2026 16:11

github-actions bot added the size/XL Extra large PR (500+ lines) label Apr 3, 2026

github-actions bot reviewed Apr 3, 2026

View reviewed changes

Copilot started reviewing on behalf of vnscka April 3, 2026 16:11 View session

Copilot AI reviewed Apr 3, 2026

View reviewed changes

imran-siddique approved these changes Apr 3, 2026

View reviewed changes

imran-siddique requested changes Apr 5, 2026

View reviewed changes

imran-siddique self-assigned this Apr 5, 2026

feat(demo): reviewed errors

d589b5d

github-actions bot added documentation Improvements or additions to documentation dependencies Pull requests that update a dependency file tests labels Apr 6, 2026