feat(tool_policy): diagnostics RPC and Developer Options panel#2715
Conversation
Expand tool_registry.diagnostics with policy posture, MCP allowlist summary, MCP write audit health, and a best-effort recent-denials buffer. Add a minimal Developer Options UI panel and unit coverage. Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a recent-denial registry and records denials on blocked tool calls; extends diagnostics types and ops to be config-driven (posture, MCP allowlists, write-audit health, recent denials); updates RPC handler/tests; adds a settings panel, i18n strings, menu entry, and a route to surface diagnostics. ChangesTool-Policy Diagnostics Feature
Sequence Diagram(s): sequenceDiagram
participant Settings as Browser Settings Panel
participant RPC as handle_diagnostics (server)
participant Ops as tool_registry::ops::diagnostics_for_config
participant Denials as denials::list
participant DB as chunk_store (mcp_writes)
Settings->>RPC: GET diagnostics
RPC->>RPC: load config (timeout)
RPC->>Ops: diagnostics_for_config(&config)
Ops->>Denials: list(MAX_RECENT)
Ops->>DB: SELECT COUNT(*) FROM mcp_writes (last 24h)
Ops-->>RPC: ToolPolicyDiagnostics
RPC-->>Settings: JSON diagnostics
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@app/src/components/settings/panels/__tests__/ToolPolicyDiagnosticsPanel.test.tsx`:
- Line 46: The failing assertion in ToolPolicyDiagnosticsPanel.test.tsx uses an
over-escaped regex for the literal parentheses; update the expect that calls
screen.getByText(...) so the regex escapes the parentheses only once (match the
literal "(24h)" rather than backslash characters), i.e. change the pattern used
in the getByText assertion to escape parentheses correctly for "Recent (24h):".
In `@app/src/components/settings/panels/ToolPolicyDiagnosticsPanel.tsx`:
- Around line 78-224: ToolPolicyDiagnosticsPanel renders many hard-coded user
strings (e.g., "Loading…", "Diagnostics unavailable", section headers like
"Policy posture", labels like "Autonomy:", empty states) that must be localized;
import and use useT() in the ToolPolicyDiagnosticsPanel component and replace
every literal JSX string (status messages, headings, dt/dd labels, empty-state
text, list placeholders like "<unnamed>") with t('...') calls using descriptive
keys (e.g., toolPolicy.loading, toolPolicy.unavailable,
toolPolicy.policyPosture.autonomy, toolPolicy.inventory.totalTools,
toolPolicy.mcpAllowlists.enabledLabel, toolPolicy.emptyUnnamed, etc.), then add
those same keys and English values to app/src/lib/i18n/en.ts in this PR so
translations load correctly.
In `@src/openhuman/agent/harness/session/turn.rs`:
- Around line 1331-1336: The call to
crate::openhuman::tool_registry::denials::record currently forwards the raw
reason into durable telemetry (using call.name, self.tool_policy.name(),
blocked_action, reason); replace this by redacting sensitive content before the
boundary: obtain a sanitized_reason via an existing redaction helper (or add one
e.g., redact_reason(reason) that strips PII/secrets or returns a redaction
token) and pass sanitized_reason to denials::record instead of reason, ensuring
only the pre-redacted string is persisted.
In `@src/openhuman/tool_registry/denials.rs`:
- Around line 20-21: The trimmed+truncated denial text in variable `reason` (set
via `truncate_reason(reason.trim())`) is persisted and later surfaced, so you
must redact sensitive data before storing or including it in diagnostics:
replace or wrap the stored value with a redaction step (e.g., call a
`redact_reason`/`sanitize_reason` function or inline redaction logic)
immediately after truncation and before any persistence or diagnostic creation;
apply the same change to the other occurrences referenced around lines 56–65 so
no raw/unredacted `reason` is ever written to records or logs.
- Around line 71-88: Tests are flaky because they share a global denial buffer;
make them deterministic by resetting the registry at test start: add a
clear/reset helper (e.g., reset_denials() or clear_denials()) in the denials
module and call it at the beginning of record_truncates_and_bounds and
record_ignores_empty_tool so each test starts with an empty global state; keep
using record, list and MAX_DENIALS as-is but ensure tests invoke the new reset
helper before exercising record/list so no cross-test races occur.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 91e32884-03d9-4135-9820-b7a9907aa647
📒 Files selected for processing (10)
app/src/components/settings/panels/DeveloperOptionsPanel.tsxapp/src/components/settings/panels/ToolPolicyDiagnosticsPanel.tsxapp/src/components/settings/panels/__tests__/ToolPolicyDiagnosticsPanel.test.tsxapp/src/pages/Settings.tsxsrc/openhuman/agent/harness/session/turn.rssrc/openhuman/tool_registry/denials.rssrc/openhuman/tool_registry/mod.rssrc/openhuman/tool_registry/ops.rssrc/openhuman/tool_registry/schemas.rssrc/openhuman/tool_registry/types.rs
Localize diagnostics panel strings, redact sensitive denial reasons before persistence, and fix the Vitest assertion for Recent (24h) copy. Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
|
Addressed CodeRabbit + CI feedback in 4e7a457 / follow-up:
Re-running CI — prior Rust core failure ( |
Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@app/src/lib/i18n/en.ts`:
- Around line 516-543: The new i18n keys under devOptions.toolPolicyDiagnostics
(e.g., devOptions.toolPolicyDiagnostics.posture.autonomy, .inventory.totalTools,
.mcpAllowlists.summary, etc.) were added to app/src/lib/i18n/en.ts but not
propagated to the chunked locale files; update the matching en chunk file
(en-*.ts) and every non-English locale chunk (ar, bn, de, es, fr, hi, id, it,
ko, pt, ru, zh-CN) by adding the exact same key names and using the English
strings as placeholders so the UI won’t show missing text for chunked locales —
search for the devOptions.toolPolicyDiagnostics prefix to locate where to insert
the new keys in each chunk file.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: d65c2d23-973f-4aaa-a5a0-9e4568113038
📒 Files selected for processing (6)
app/src/components/settings/panels/DeveloperOptionsPanel.tsxapp/src/components/settings/panels/ToolPolicyDiagnosticsPanel.tsxapp/src/components/settings/panels/__tests__/ToolPolicyDiagnosticsPanel.test.tsxapp/src/lib/i18n/en.tssrc/openhuman/tool_registry/denials.rssrc/openhuman/tool_registry/ops.rs
🚧 Files skipped from review as they are similar to previous changes (4)
- app/src/components/settings/panels/tests/ToolPolicyDiagnosticsPanel.test.tsx
- app/src/components/settings/panels/DeveloperOptionsPanel.tsx
- app/src/components/settings/panels/ToolPolicyDiagnosticsPanel.tsx
- src/openhuman/tool_registry/ops.rs
…ed-tool-admission # Conflicts: # app/test/e2e/specs/mega-flow.spec.ts
|
@MrMrVlad this PR has merge conflicts with main — please rebase/resolve before review. |
sanil-23
left a comment
There was a problem hiding this comment.
Approving from shepherd queue (round 2). All CI green or known flakes re-running; pnpm review fix round complete. Happy to merge once required reviewers concur.
# Conflicts: # src/openhuman/agent/harness/session/turn.rs
|
Hi @graycyrus — flagging this for your review when you have time. MrMrVlad's tool-policy diagnostics RPC PR is currently CONFLICTING against main because #2551 (trusted-capability-providers) landed during our review-fix push and rewrote the same tool_registry crate. I'm going to attempt to rebase + resolve on his behalf (maintainer_can_modify is on); once green I'll ping again. The author's existing inline-comment thread on turn.rs:1336 has been addressed and replied to in the d517226 push. |
# Conflicts: # src/openhuman/tool_registry/mod.rs # src/openhuman/tool_registry/ops.rs # src/openhuman/tool_registry/schemas.rs # src/openhuman/tool_registry/types.rs
sanil-23
left a comment
There was a problem hiding this comment.
Re-approving after merging upstream/main to resolve the tool_registry conflict (induced by #2551 trusted-capability-providers landing during the rfix push). Union strategy: kept HEAD's posture/mcp_allowlists/mcp_write_audit/recent_denials AND upstream's CapabilityProviderDiagnostics in ToolPolicyDiagnostics; kept upstream's async diagnostics() + sync diagnostics_for_config() split; HEAD's test assertions migrated into ops_test.rs. cargo check passes.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/openhuman/tool_registry/ops.rs`:
- Around line 497-498: The extracted test module is being included
unconditionally via `#[path = "ops_test.rs"] mod tests;` so test code will be
compiled into non-test builds; add the test gate by prefixing the module
declaration with `#[cfg(test)]` so it becomes `#[cfg(test)] #[path =
"ops_test.rs"] mod tests;` to ensure `ops_test.rs` is only compiled for test
targets and follows repo conventions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: a3842cb4-764a-4f19-97f0-a7210e9641b4
📒 Files selected for processing (23)
app/src/components/settings/panels/DeveloperOptionsPanel.tsxapp/src/lib/i18n/chunks/ar-1.tsapp/src/lib/i18n/chunks/bn-1.tsapp/src/lib/i18n/chunks/de-1.tsapp/src/lib/i18n/chunks/en-1.tsapp/src/lib/i18n/chunks/es-1.tsapp/src/lib/i18n/chunks/fr-1.tsapp/src/lib/i18n/chunks/hi-1.tsapp/src/lib/i18n/chunks/id-1.tsapp/src/lib/i18n/chunks/it-1.tsapp/src/lib/i18n/chunks/ko-1.tsapp/src/lib/i18n/chunks/pl-1.tsapp/src/lib/i18n/chunks/pt-1.tsapp/src/lib/i18n/chunks/ru-1.tsapp/src/lib/i18n/chunks/zh-CN-1.tsapp/src/lib/i18n/en.tsapp/src/pages/Settings.tsxsrc/openhuman/agent/harness/session/turn.rssrc/openhuman/tool_registry/mod.rssrc/openhuman/tool_registry/ops.rssrc/openhuman/tool_registry/ops_test.rssrc/openhuman/tool_registry/schemas.rssrc/openhuman/tool_registry/types.rs
✅ Files skipped from review due to trivial changes (10)
- app/src/components/settings/panels/DeveloperOptionsPanel.tsx
- app/src/lib/i18n/chunks/de-1.ts
- app/src/lib/i18n/chunks/fr-1.ts
- app/src/lib/i18n/chunks/ar-1.ts
- app/src/lib/i18n/chunks/en-1.ts
- app/src/lib/i18n/chunks/zh-CN-1.ts
- app/src/lib/i18n/en.ts
- app/src/lib/i18n/chunks/ko-1.ts
- app/src/lib/i18n/chunks/it-1.ts
- app/src/lib/i18n/chunks/id-1.ts
🚧 Files skipped from review as they are similar to previous changes (7)
- app/src/lib/i18n/chunks/hi-1.ts
- app/src/lib/i18n/chunks/pt-1.ts
- src/openhuman/agent/harness/session/turn.rs
- src/openhuman/tool_registry/types.rs
- app/src/lib/i18n/chunks/bn-1.ts
- app/src/lib/i18n/chunks/ru-1.ts
- app/src/lib/i18n/chunks/pl-1.ts
Address CodeRabbit feedback on PR tinyhumansai#2715 — the extracted test module was included unconditionally via #[path = "ops_test.rs"] mod tests; which compiled test code into non-test builds. Add the missing #[cfg(test)] gate to match repo conventions. Co-Authored-By: Claude <noreply@anthropic.com>
sanil-23
left a comment
There was a problem hiding this comment.
Re-approving on 896cf158 — addressed CodeRabbit's single inline comment (Major / Quick win): gated the extracted ops_test.rs with #[cfg(test)] per repo convention. cargo check + cargo check --tests both clean. CodeRabbit's autofix will re-review on push.
|
Posted the upstream fix for the |
graycyrus
left a comment
There was a problem hiding this comment.
@MrMrVlad heads up — CI is failing on this PR (Rust Core Tests + Quality and Rust Core Coverage), so I'm holding off on a full approval until those are green. I did spot a couple things while skimming though:
generated.rs out-of-scope dead code (src/openhuman/tools/generated.rs) — GeneratedToolAdmissionConfig, GeneratedToolAdmissionReport, and the private is_external_effect() method are all new additions with no callers anywhere in this PR. If the crate compiles with dead_code warnings as errors, these are almost certainly causing the Rust Core Tests + Quality failure. Either gate them behind #[allow(dead_code)] with a tracking comment pointing to the follow-up PR where they'll be wired up, or move them into that follow-up PR entirely. Bundling future scaffolding into a diagnostics PR makes the diff harder to review and CI harder to debug.
mcp_write_audit_health always reports enabled: true (src/openhuman/tool_registry/ops.rs) — Both the Ok and Err arms of mcp_write_audit_health set enabled: true unconditionally. If MCP write audit can actually be disabled via config, this field is always wrong. If it can't be disabled, rename enabled to something like available or add a comment documenting the invariant.
Fix the CI failures first and I'll come back to approve.
…table parser PR tinyhumansai#2853 added `health_snapshot: CORE_RPC_METHODS.healthSnapshot,` to LEGACY_METHOD_ALIASES in app/src/services/rpcMethods.ts. Prettier strips quotes from keys that are valid JS identifiers, so the canonical TS form is unquoted (`health_snapshot:`, not `'health_snapshot':`). But the `frontend_legacy_aliases_match_server_alias_table` test's parser required every key to be quoted via `quoted_value()`, which panicked: expected quoted value in `health_snapshot` This blocked CI on every PR rebasing onto current main (tinyhumansai#2687 + tinyhumansai#2715 confirmed inheriting the failure). Fix the parser, not the source file: 1. Filter `//`-comment lines from the LEGACY_METHOD_ALIASES body before compacting (otherwise the first quoted key picks up the inline comment header). 2. Accept both quoted (`'foo':`) and bare-identifier (`foo:`) keys. Co-Authored-By: Claude <noreply@anthropic.com>
sanil-23
left a comment
There was a problem hiding this comment.
Re-approving after merging upstream/main (incl. #2865 parser fix that clears the inherited test/Rust Core Tests + Quality + Rust Core Coverage failures). 21 legacy_aliases tests pass locally; CR's #[cfg(test)] gate fix from earlier still in place.
|
Posted #2869 with the root-cause fix: #2830's merge inadvertently reverted #2786's SessionExpired classification back to BackendUserError, so the |
sanil-23
left a comment
There was a problem hiding this comment.
Re-approving after merging upstream/main (incl. my #2869 fix that restores the SessionExpired classification for the embedding 401 'Invalid token' wire shape). The inherited test/Rust Core Tests + Quality + Rust Core Coverage failures clear with this merge — verified locally that the 4K5 test passes on the merged head.
… gate The Coverage Gate (diff-cover ≥ 80%) was failing on this PR at 65.6% on ToolPolicyDiagnosticsPanel.tsx because the existing single happy-path test left the error branch + the populated-mcp_allowlists branch + the mcp_write_audit.last_error branch uncovered (lines 67-68, 85, 181, 183-184, 188, 210, 221, 227-228). Add three companion tests that exercise the remaining branches: 1. renders unavailable card when the RPC throws (catch + error render) 2. renders mcp_allowlists per-server rows when populated 3. renders mcp_write_audit last_error when present 4/4 tests pass locally. Should clear the diff-cover gate. Co-Authored-By: Claude <noreply@anthropic.com>
sanil-23
left a comment
There was a problem hiding this comment.
Re-approving on 3442402 — added 3 companion tests for ToolPolicyDiagnosticsPanel covering the previously-uncovered error/mcp_allowlists/mcp_write_audit branches. 4/4 vitest tests pass locally; diff-cover gate should clear (was 65.6%, gap was a single file at ~10 lines).
|
@graycyrus update — the 2 CI failures you flagged on 18:28Z (Rust Core Tests + Quality + Rust Core Coverage) were inherited from main, not this PR's code. Root cause was #2830 (commit Also addressed CodeRabbit's earlier inline comment (added |
Summary
tool_registry.diagnosticswith policy posture, MCP allowlist summary, MCP write-audit health, and recent policy denials (redacted).Problem
When tools are hidden or policy blocks execution, operators today must reconstruct state from logs and config. Issue #2136 asks for a single diagnostics surface showing inventory, policy mode, MCP allowlists, audit health, and recent denials—without exposing secrets.
Solution
ToolPolicyDiagnosticsnow composes registry counts, autonomy posture, per-server MCP allow/deny list sizes, best-effortmcp_writesrow count (24h), and up to 25 recent denials fromtool_registry::denials.turn.rs, append a truncated, redacted denial record.ToolPolicyDiagnosticsPanelcallstool_registry.diagnosticsand sections the payload for quick review.This is the first slice of #2136 (observability + UI). Conformance reporting (runtime contract hash, hidden raw-write tool checks, copyable support bundle) can follow in a separate PR.
Submission Checklist
.github/workflows/coverage.yml; local diff-cover not run pre-submit## Related— N/A: no matrix feature IDs touchedCloses #NNNin the## RelatedsectionImpact
Related
Test plan
cargo test tool_registry --lib(diagnostics + denials ring buffer)pnpm test—ToolPolicyDiagnosticsPanel.test.tsxpnpm format:check,pnpm lint,pnpm compile,pnpm rust:check(pre-push hook)AI Authored PR Metadata (required for Codex/Linear PRs)
Linear Issue
Commit & Branch
feat/2136-tool-policy-diagnostics3c63b46c1b9c5e8a265b33376932a272d412355aValidation Run
pnpm --filter openhuman-app format:checkpnpm typecheck/pnpm compilecargo test tool_registry --lib, Vitest panel testpnpm rust:checkpnpm rust:checkValidation Blocked
command:N/Aerror:N/Aimpact:N/ABehavior Changes
Parity Contract
tool_registry.list/getunchanged.Duplicate / Superseded PR Handling
Summary by CodeRabbit
New Features
Tests
Documentation