[FEATURE]: Added a Interactive dashboard terminal for testing commands against the safety engine. by Moresh-Mishra · Pull Request #118 · sreerevanth/AgentWatch

Moresh-Mishra · 2026-06-01T18:02:53Z

Summary

Short explanation of the change.

This change fixes an issue in the Safety Lab where the threat-path chips could push the page width and produce a horizontal scrollbar. The threat-path now wraps and each chip has a max width so long policy IDs/reasons do not force overflow.

Closes #19

Changes

Minor: Adjusted the threat-path chip CSS classes to keep visual emphasis for matched nodes while allowing multi-line layout.
Validation
Ran tests: pytest tests/test_api_safety_check.py -q — 2 passed.
Tested locally: started backend + Next dev server, exercised Safety Lab UI, and verified the threat-path now wraps across multiple lines and the page no longer shows a horizontal scrollbar for long/large threat paths.
Manual visual check: clicked several preset commands and verified selected node details still render correctly after the layout change.

Tests

tested locally using:

From repo root:

backend (venv active)
python -m uvicorn agentwatch.api.server:app --host 0.0.0.0 --port 8000

in another terminal: frontend
cd frontend
NEXT_PUBLIC_API_URL="http://localhost:8000/api/v1" npm run dev

run tests

pytest tests/test_api_safety_check.py -q

Screenshot

Frontend:

Summary by CodeRabbit

New Features
- Safety Lab UI to run commands through a safety check, inspect decisions, threat-path nodes, and view run history
- Safety check API that returns a decision (blocked/requires_approval/allowed), matched policies, reasons, risk scores, and a detailed threat-path
- Demo mode with shareable demo URLs to showcase preset scenarios
Tests
- Added/expanded tests for the safety check endpoint, including validation that command and arguments remain in sync

coderabbitai · 2026-06-01T18:03:09Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 575b7319-9610-4216-918c-301d314270dc

📥 Commits

Reviewing files that changed from the base of the PR and between 5e35ec3 and 7818444.

📒 Files selected for processing (1)

frontend/pages/safety-lab.tsx

🚧 Files skipped from review as they are similar to previous changes (1)

frontend/pages/safety-lab.tsx

📝 Walkthrough

Walkthrough

Adds a backend POST /api/v1/safety/check with Pydantic schemas and tests, TypeScript client types and api.checkSafety, and a Next.js Safety Lab page implementing command runs, demo mode, threat-path visualization, and run history.

Changes

Safety Lab Feature Implementation

Layer / File(s)	Summary
Backend endpoint, models, and tests `agentwatch/api/server.py`, `tests/test_api_safety_check.py`	Added `re` import and `RiskScorer` usage, Pydantic models (`SafetyCheckRequest`, `ThreatPathNode`, `SafetyCheckResponse`), implemented `POST /api/v1/safety/check` that normalizes the command, builds ToolCallData, simulates SafetyEngine.check_event with the current policy, derives `decision` and `threat_path` using configured RiskScorer patterns (regex-handling), and added tests asserting blocked/allowed outcomes and that the command is preserved in arguments.
Frontend API types and client method `frontend/lib/api.ts`	Exported `ThreatPathNode` and `SafetyCheckResponse` TypeScript interfaces and added `api.checkSafety` which POSTs command/tool/arguments/affected_resources to `/safety/check` and returns a typed `SafetyCheckResponse`.
Frontend page state, run handler, and demo control `frontend/pages/safety-lab.tsx` (lines 1–96)	Initialized page state (command, running, error, history, selected node). Implemented `runCommand` to call `api.checkSafety`, prepend capped history items, select first matched node, and manage running/error state. Added demo-mode interval cycling presets and demo URL generation.
Frontend UI: layout, terminal, results, threat path, history `frontend/pages/safety-lab.tsx` (lines 97–272)	Rendered header and controls, terminal input with presets and Run button, Block Explanation panel showing decision/risk/reasons, Threat Path panel with selectable nodes and detailed view, and History panel listing prior runs with selection behavior and timestamps.

Sequence Diagram

sequenceDiagram
  participant User
  participant SafetyLabUI
  participant api.checkSafety
  participant SafetyCheckEndpoint
  participant SafetyEngine
  participant RiskScorer
  User->>SafetyLabUI: Enter command & click Run
  SafetyLabUI->>api.checkSafety: POST command/tool/args
  api.checkSafety->>SafetyCheckEndpoint: POST /api/v1/safety/check
  SafetyCheckEndpoint->>SafetyEngine: Simulate check with current policy (ToolCallData)
  SafetyEngine-->>SafetyCheckEndpoint: Risk level & blocked decision
  SafetyCheckEndpoint->>RiskScorer: Score configured patterns (pattern, regex)
  RiskScorer-->>SafetyCheckEndpoint: Pattern match results (matched, score)
  SafetyCheckEndpoint-->>api.checkSafety: SafetyCheckResponse (decision, threat_path)
  api.checkSafety-->>SafetyLabUI: Typed response
  SafetyLabUI->>SafetyLabUI: Update history & render results
  SafetyLabUI->>User: Display block explanation & threat path

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

sreerevanth/AgentWatch#91: Modifies SafetyEngine and extends SafetyCheckData (blast_radius) which may affect outputs used by the new safety-check endpoint.

Suggested reviewers

sreerevanth

Poem

🐰 I nudge the keys, a curious hop,
Commands parade and warnings pop,
Patterns scored in tidy rows,
Demo runs and history grows,
A Safety Lab from burrow to top.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title directly relates to the main objective: adding an interactive dashboard terminal for testing commands against the safety engine.
Linked Issues check	✅ Passed	The PR implements all four core features from issue `#19`: live command testing, real-time block explanations, threat path visualization, and shareable demo mode.
Out of Scope Changes check	✅ Passed	All changes are within scope: backend endpoint and models, frontend API client, Safety Lab page component, and tests for the new endpoint. No unrelated modifications detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@agentwatch/api/server.py`:
- Around line 589-595: merged_args currently allows request.arguments to
overwrite the normalized command (cmd), causing raw_command vs parsed command
desync; change the merge so that "command": cmd takes precedence (e.g., build
merged_args from request.arguments then set merged_args["command"]=cmd) before
constructing ToolCallData, ensuring merged_args, raw_command, and tool_call
arguments remain consistent; update the code around merged_args, ToolCallData,
request.arguments, and cmd accordingly.

In `@frontend/pages/safety-lab.tsx`:
- Around line 72-82: The interval in the useEffect for demoMode captures a stale
runCommand closure and can start overlapping runs; fix by making the execution
guard stable: create an in-flight ref (e.g., inFlightRef) or memoize runCommand
with useCallback including its deps (such as running) so the interval always
uses the current state, then inside the setInterval handler check
inFlightRef.current (or the stabilized running) and skip starting a new run if
one is in progress, and ensure you set/clear the ref around the async runCommand
call (set true before await and false after) so PRESET_COMMANDS and demoMode
behavior remains unchanged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 63135c61-a7ba-4d3e-81d9-342960b79671

📥 Commits

Reviewing files that changed from the base of the PR and between a73eb87 and 8d9dae9.

📒 Files selected for processing (4)

agentwatch/api/server.py
frontend/lib/api.ts
frontend/pages/safety-lab.tsx
tests/test_api_safety_check.py

sreerevanth · 2026-06-01T18:11:36Z

@Moresh-Mishra Thanks for the work — the UI looks great and the overflow issue appears resolved.

Before merge, could you please address the two CodeRabbit findings:

Prevent request.arguments from overriding the normalized command value when building merged_args.
Ensure demo mode does not start overlapping safety checks if a previous run is still in progress.

Once those are updated and CI passes, I'll take another look. 🚀

Moresh-Mishra · 2026-06-01T18:13:03Z

Ok will resolve the issues @sreerevanth

Moresh-Mishra · 2026-06-02T05:21:29Z

@sreerevanth I have made the changes coderabbit asked for. You can merge them safely. Could you also add labels to this PR. Thanks!

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

frontend/pages/safety-lab.tsx (1)

240-260: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Clicking a history entry only updates selectedNode, not the displayed run.

The Block Explanation and Threat Path panels are bound to latest = history[0]?.result (Line 45), so selecting an older run mutates selectedNode while the chips and explanation still reflect the most recent run. The resulting node detail (Lines 222-230) can reference a policy_id that isn't among the rendered chips, which is confusing.

Track the active run explicitly and derive the panels from it.

♻️ Suggested wiring

-  const [selectedNode, setSelectedNode] = useState<ThreatPathNode | null>(null)
+  const [selectedNode, setSelectedNode] = useState<ThreatPathNode | null>(null)
+  const [selectedRunId, setSelectedRunId] = useState<string | null>(null)

-  const latest = history[0]?.result
+  const activeRun = history.find((r) => r.id === selectedRunId) ?? history[0]
+  const latest = activeRun?.result

                 <button
                   key={item.id}
                   onClick={() => {
+                    setSelectedRunId(item.id)
                     setSelectedNode(item.result.threat_path.find((node) => node.matched) ?? null)
                   }}

Also reset selectedRunId to the new run (or null) inside runCommand so a fresh run takes focus.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/pages/safety-lab.tsx` around lines 240 - 260, The UI only updates
selectedNode when clicking history entries but the Block Explanation and Threat
Path panels are always derived from latest = history[0]?.result, causing
mismatch; introduce and use a tracked active run id (e.g., selectedRunId) and
derive the active result (activeResult = history.find(h => h.id ===
selectedRunId)?.result ?? history[0]?.result) for the Block Explanation, Threat
Path chips and node details instead of latest, update the history button onClick
to set selectedRunId = item.id (and set selectedNode as you already do), and
ensure runCommand resets selectedRunId to null or the new run id so new runs
take focus.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@frontend/pages/safety-lab.tsx`:
- Around line 240-260: The UI only updates selectedNode when clicking history
entries but the Block Explanation and Threat Path panels are always derived from
latest = history[0]?.result, causing mismatch; introduce and use a tracked
active run id (e.g., selectedRunId) and derive the active result (activeResult =
history.find(h => h.id === selectedRunId)?.result ?? history[0]?.result) for the
Block Explanation, Threat Path chips and node details instead of latest, update
the history button onClick to set selectedRunId = item.id (and set selectedNode
as you already do), and ensure runCommand resets selectedRunId to null or the
new run id so new runs take focus.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 5ef1c915-f5e6-4527-9abd-4468cf092f0d

📥 Commits

Reviewing files that changed from the base of the PR and between 8d9dae9 and 5e35ec3.

📒 Files selected for processing (3)

agentwatch/api/server.py
frontend/pages/safety-lab.tsx
tests/test_api_safety_check.py

🚧 Files skipped from review as they are similar to previous changes (2)

tests/test_api_safety_check.py
agentwatch/api/server.py

feature-playground-dashboard-terminal-added

8d9dae9

coderabbitai Bot reviewed Jun 1, 2026

View reviewed changes

Comment thread agentwatch/api/server.py Outdated

Comment thread frontend/pages/safety-lab.tsx Outdated

coderabbit-changes

5e35ec3

coderabbitai Bot reviewed Jun 2, 2026

View reviewed changes

coderabbit-2

7818444

sreerevanth added NSoC 26' level: intermediate Intermediate difficulty frontend Frontend related level2 labels Jun 3, 2026

sreerevanth merged commit 8f8a785 into sreerevanth:main Jun 3, 2026
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE]: Added a Interactive dashboard terminal for testing commands against the safety engine.#118

[FEATURE]: Added a Interactive dashboard terminal for testing commands against the safety engine.#118
sreerevanth merged 3 commits into
sreerevanth:mainfrom
Moresh-Mishra:feature/playground

Moresh-Mishra commented Jun 1, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 1, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

sreerevanth commented Jun 1, 2026 •

edited

Loading

Uh oh!

Moresh-Mishra commented Jun 1, 2026

Uh oh!

Moresh-Mishra commented Jun 2, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Moresh-Mishra commented Jun 1, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Tests

run tests

Screenshot

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sreerevanth commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Moresh-Mishra commented Jun 1, 2026

Uh oh!

Moresh-Mishra commented Jun 2, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Moresh-Mishra commented Jun 1, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 1, 2026 •

edited

Loading

sreerevanth commented Jun 1, 2026 •

edited

Loading