Toolguard by naamaz · Pull Request #176 · cuga-project/cuga-agent

naamaz · 2026-04-27T11:44:46Z

Summary by CodeRabbit

New Features
- Added Tool Guard functionality to enforce tool-usage policies with runtime validation.
- Added SDK methods to generate guard examples, code, and update tool guards.
- Enhanced policy persistence and filesystem synchronization for tool guards.
Tests
- Added comprehensive test coverage for tool guard functionality.
Chores
- Added toolguard library dependency.
- Restricted policy save access to administrators.

coderabbitai · 2026-04-27T11:45:00Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR implements Tool Guard integration, adding policy-backed runtime validation for tool calls. It introduces a ToolGuard model, a runtime executor that validates tool calls against stored policies, manages guard initialization and lifecycle, and provides SDK methods for generating guard examples and code.

Changes

Cohort / File(s)	Summary
Dependencies & Package Exports `pyproject.toml`, `src/cuga/backend/cuga_graph/policy/__init__.py`, `src/cuga/backend/cuga_graph/policy/tool_guard/__init__.py`	Added `toolguard>=0.2.17` dependency and exported `ToolGuard`, `ToolGuide`, `ToolApproval`, `OutputFormatter` from policy module; created tool_guard package exports.
Core Policy Models `src/cuga/backend/cuga_graph/policy/models.py`	Added `ToolGuard` model with description, examples, and policy code; extended `ToolGuide` to support optional `tool_guards` mapping.
Policy Serialization & Loading `src/cuga/backend/cuga_graph/policy/filesystem_sync.py`, `folder_loader.py`, `utils.py`	Updated markdown frontmatter generation to serialize `tool_guards`; extended policy loaders to parse and hydrate `ToolGuard` instances; added safe handling for missing triggers and `policy_type`.
Tool Guard Runtime Execution `src/cuga/backend/cuga_graph/policy/tool_guard/tool_guard_runtime.py`, `tool_invoker.py`	Implemented `ToolGuardRuntime` to load policies, validate tool calls against embedded guard code, and raise violations; created `ToolGuardInvoker` bridge to CUGA's tool provider.
Tool Guard Build-time Manager `src/cuga/backend/cuga_graph/policy/tool_guard/manager.py`	Added `ToolGuardManager` to generate guard examples and code using the external `toolguard` library; handles spec building, code generation, and domain file persistence.
Tool Invocation Integration `src/cuga/backend/tools_env/registry/registry/api_registry.py`, `api_registry_server.py`	Integrated `ToolGuardRuntime` into `ApiRegistry` to validate all tool calls pre-execution; added policy storage initialization, lifecycle management, and fail-closed error handling across server startup and reload paths.
SDK & Testing `src/cuga/sdk.py`, `src/cuga/sdk_core/debug_tool_guard_e2e.py`, `src/cuga/backend/cuga_graph/policy/tests/test_filesystem_sync.py`, `test_sdk_policies.py`, `policies-export-2025-12-31.json`	Added SDK methods for updating tool guards, generating examples and guard code; created end-to-end test script; extended filesystem tests for guard serialization; removed obsolete policy export file.
API Authorization `src/cuga/backend/server/main.py`	Tightened policy save endpoint authorization from `require_auth` to `require_manage_access` and updated documentation.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant ApiRegistry
    participant ToolGuardRuntime
    participant PolicyStorage
    participant ToolGuard Runtime Domain
    participant MCP Tool

    Client->>ApiRegistry: call_function(app_name, function_name, arguments)
    ApiRegistry->>ToolGuardRuntime: guard_tool_call(app_name, function_name, arguments)
    
    ToolGuardRuntime->>PolicyStorage: get_policies() [lazy load if needed]
    PolicyStorage-->>ToolGuardRuntime: [ToolGuide policies with tool_guards]
    
    ToolGuardRuntime->>ToolGuard Runtime Domain: load RuntimeDomain for app_name
    ToolGuard Runtime Domain-->>ToolGuardRuntime: domain types & implementations
    
    ToolGuardRuntime->>ToolGuard Runtime Domain: execute guard_* functions with arguments
    
    alt Policy Violation Detected
        ToolGuard Runtime Domain-->>ToolGuardRuntime: PolicyViolationException
        ToolGuardRuntime-->>ApiRegistry: violation_message (error)
        ApiRegistry-->>Client: 403 ToolGuardViolation response
    else Policy Passes
        ToolGuardRuntime-->>ApiRegistry: None (OK)
        ApiRegistry->>MCP Tool: invoke tool
        MCP Tool-->>ApiRegistry: result
        ApiRegistry-->>Client: tool result
    end

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

feat: add out-of-the-box agents #69: Modifies src/cuga/backend/cuga_graph/policy/utils.py with complementary changes to policy parsing and validation behavior.

Suggested reviewers

gjt-prog
sami-marreed

Poem

🐰 A guard hops in to watch each tool,
With policies enforcing every rule.
Runtime checks keep calls in line,
While compliance examples help guards shine.
The agent's safer now, with fur so fine! 🛡️

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title 'Toolguard' is too vague and generic. It does not clearly convey the scope or main purpose of the substantial changes made across multiple files.	Consider using a more descriptive title such as 'Add ToolGuard integration for tool-level policy enforcement' to better summarize the main changes.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	Docstring coverage is 95.35% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/cuga/backend/server/main.py (1)
2364-2368: ⚠️ Potential issue | 🟠 Major

Authorization asymmetry: GET exposes policy_code to any authenticated user, but POST is restricted to manage roles.

You've correctly tightened POST /api/config/policies to require_manage_access because policy code is executed unsandboxed at runtime. However, this GET handler still uses require_auth (see line 2299) and now returns tool_guards, whose ToolGuard.policy_code field contains the same executable Python. Any authenticated chat-only user can therefore read the policy code, which:

leaks security-sensitive validation logic (an attacker who learns the guard logic can craft inputs to bypass it), and

is inconsistent with the manage-only write boundary you just established.

Consider either gating the GET endpoint on require_manage_access, or stripping/redacting the policy_code field from each ToolGuard for non-manage users (similar pattern to how secrets routes mask values).
🛡️ Sketch (redact policy_code for non-manage users)
-            elif policy_dict["type"] == "tool_guide":
+            elif policy_dict["type"] == "tool_guide":
                 frontend_policy["target_tools"] = policy_dict.get("target_tools", [])
                 frontend_policy["target_apps"] = policy_dict.get("target_apps")
                 frontend_policy["guide_content"] = policy_dict.get("guide_content", "")
-                frontend_policy["tool_guards"] = policy_dict.get("tool_guards")
+                tool_guards = policy_dict.get("tool_guards")
+                if tool_guards and not _user_has_manage_role(current_user):
+                    tool_guards = {
+                        name: {**g, "policy_code": "<redacted>"} if isinstance(g, dict) else g
+                        for name, g in tool_guards.items()
+                    }
+                frontend_policy["tool_guards"] = tool_guards
                 frontend_policy["prepend"] = policy_dict.get("prepend", False)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cuga/backend/server/main.py` around lines 2364 - 2368, The GET
/api/config/policies handler currently uses require_auth and returns
ToolGuard.policy_code in frontend_policy["tool_guards"], leaking executable
policy code; change the handler to either call require_manage_access instead of
require_auth or, if you want read access for non-manage users, iterate over each
ToolGuard in the response and redact/mask the policy_code (e.g., replace with
None or "<redacted>") when the current user does not have manage permission;
reference the GET handler that builds frontend_policy["tool_guards"], the
require_auth / require_manage_access decorators, and the ToolGuard.policy_code
field when implementing the check/masking (mirror how secrets routes hide
values).

🧹 Nitpick comments (11)

src/cuga/backend/cuga_graph/policy/tool_guard/tool_invoker.py (2)

41-52: Optional: protect cache initialization against concurrent invocations.

_get_tools is called from invoke, which can be triggered concurrently for different tools. Two concurrent first-time callers will both call tool_provider.get_all_tools() and overwrite _tools_cache. The end state is correct, but you'll do duplicate work. If get_all_tools() is expensive, guard it with an asyncio.Lock.

♻️ Sketch

-    def __init__(self, tool_provider):
+    def __init__(self, tool_provider):
         ...
         self._tools_cache: Optional[Dict[str, Any]] = None
+        self._cache_lock = asyncio.Lock()
         logger.debug("Initialized ToolGuardInvoker with CUGA tool provider")

     async def _get_tools(self) -> Dict[str, Any]:
-        if self._tools_cache is None:
-            tools_list = await self.tool_provider.get_all_tools()
-            self._tools_cache = {tool.name: tool for tool in tools_list}
-            logger.debug(f"Cached {len(self._tools_cache)} tools")
-        return self._tools_cache
+        if self._tools_cache is None:
+            async with self._cache_lock:
+                if self._tools_cache is None:
+                    tools_list = await self.tool_provider.get_all_tools()
+                    self._tools_cache = {tool.name: tool for tool in tools_list}
+                    logger.debug(f"Cached {len(self._tools_cache)} tools")
+        return self._tools_cache

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cuga/backend/cuga_graph/policy/tool_guard/tool_invoker.py` around lines
41 - 52, The _get_tools method can race on first initialization; add an
asyncio.Lock (e.g., self._tools_lock initialized in the class __init__) and use
it to guard the cache population: check self._tools_cache first, if None then
async with self._tools_lock and re-check self._tools_cache inside the lock, then
call await self.tool_provider.get_all_tools() and set self._tools_cache; this
prevents duplicate expensive get_all_tools() calls while preserving the existing
behavior of _get_tools and invoke.

108-112: Use conversion flag and consider preserving the original exception type.

Two minor points:

Ruff RUF010: prefer !s conversion flag over explicit str(e) in f-strings.
Wrapping every non-ValueError as RuntimeError masks toolguard-specific exception types (e.g., timeouts, validation errors) that the runtime caller may want to distinguish. Since from e is used, the chain is preserved, but downstream except clauses can no longer match by type.

♻️ Proposed diff

         except Exception as e:
-            logger.error(f"Failed to invoke tool '{toolname}': {e}")
-            raise RuntimeError(
-                f"Tool invocation failed for '{toolname}': {str(e)}"
-            ) from e
+            logger.error(f"Failed to invoke tool '{toolname}': {e!s}")
+            raise RuntimeError(
+                f"Tool invocation failed for '{toolname}': {e!s}"
+            ) from e

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cuga/backend/cuga_graph/policy/tool_guard/tool_invoker.py` around lines
108 - 112, In the except block in tool_invoker.py (the "except Exception as e"
handling around the tool invocation that logs with logger and raises a
RuntimeError for toolname), change the f-string to use the !s conversion flag
(f"Failed to invoke tool '{toolname}': {e!s}") and stop unconditionally wrapping
every caught exception as RuntimeError; instead re-raise ValueError or other
toolguard-specific exceptions as-is (or re-raise e) so callers can match by
type, and only wrap into a RuntimeError when you must add context (preserving
the original via "raise RuntimeError(... ) from e" if used).

src/cuga/backend/cuga_graph/policy/folder_loader.py (1)

215-220: Wrap ToolGuard construction with a per-tool error to aid diagnosis.

If a user-authored markdown file has a tool_guards entry that is missing the required description field or contains a typo'd field name, ToolGuard(**guard_config) raises pydantic.ValidationError. That exception bubbles up to the file-level try/except in load_policies_from_folder, which logs "Failed to load <file_path>: ..." but doesn't tell the user which tool_name inside tool_guards was malformed — and a single bad guard fails the entire ToolGuide.

Also: if raw_tool_guards is present but not a dict (e.g., a list, due to a YAML formatting mistake), this silently falls through to tool_guards = None with no warning.

♻️ Proposed diff

-    raw_tool_guards = frontmatter.get('tool_guards')
-    tool_guards = (
-        {tool_name: ToolGuard(**guard_config) for tool_name, guard_config in raw_tool_guards.items()}
-        if isinstance(raw_tool_guards, dict)
-        else None
-    )
+    raw_tool_guards = frontmatter.get('tool_guards')
+    tool_guards = None
+    if isinstance(raw_tool_guards, dict):
+        tool_guards = {}
+        for tool_name, guard_config in raw_tool_guards.items():
+            if not isinstance(guard_config, dict):
+                raise ValueError(
+                    f"ToolGuide {name}: tool_guards['{tool_name}'] must be a mapping, "
+                    f"got {type(guard_config).__name__}"
+                )
+            try:
+                tool_guards[tool_name] = ToolGuard(**guard_config)
+            except Exception as e:
+                raise ValueError(
+                    f"ToolGuide {name}: invalid tool_guards entry for '{tool_name}': {e}"
+                ) from e
+    elif raw_tool_guards is not None:
+        logger.warning(
+            f"ToolGuide {name}: ignoring 'tool_guards' frontmatter — expected dict, "
+            f"got {type(raw_tool_guards).__name__}"
+        )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cuga/backend/cuga_graph/policy/folder_loader.py` around lines 215 - 220,
The tool guard parser in folder_loader.py currently calls
ToolGuard(**guard_config) for all entries and will raise a
pydantic.ValidationError that aborts the whole file without identifying which
tool failed; update the block that builds tool_guards (the code using
raw_tool_guards and ToolGuard) to (1) if raw_tool_guards is present but not a
dict, log a warning mentioning the file and raw_tool_guards type and skip it,
and (2) iterate raw_tool_guards.items() and construct each ToolGuard inside a
try/except that catches ValidationError (or Exception), logs an error including
the offending tool_name and the validation error details, and continues building
the rest so one bad tool entry does not break the entire ToolGuard set used by
load_policies_from_folder.

src/cuga/backend/cuga_graph/policy/filesystem_sync.py (2)

92-94: Minor: str(None) fallback can silently write "None" as the policy type.

When both policy.type and policy.policy_type are missing, this falls back to str(policy_type) which yields the literal string "None". That value will be written into the YAML frontmatter and won't round-trip correctly through folder_loader.POLICY_CREATORS (which only knows the canonical type strings). All concrete policy subclasses do set type, so this is unlikely in practice — but consider raising or logging a warning instead of producing a meaningless "None" string.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cuga/backend/cuga_graph/policy/filesystem_sync.py` around lines 92 - 94,
The code currently falls back to str(policy_type) which yields "None" when both
policy.type and policy.policy_type are missing; update the logic around
policy_type and policy_type_value so that if policy_type is None you fail loudly
(e.g., raise a ValueError with a clear message including the policy identity or
log a warning and skip writing) instead of writing the string "None";
specifically change the block that computes policy_type and policy_type_value
(the variables policy_type and policy_type_value in filesystem_sync.py) to
detect None and raise/log, otherwise continue to extract .value or str(...) as
before so invalid/missing types are not written into YAML frontmatter.
137-141: Production deployment note: .cuga/tool_guides/ contains policy_code in plaintext.

guard.model_dump() serializes the full ToolGuard including the unsandboxed Python policy_code. In production, restrict read access to the agent process owner so that chat-only users without require_manage_access cannot inspect validation logic via filesystem access.

Note: .cuga is already in .gitignore, so accidental commits are prevented.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cuga/backend/cuga_graph/policy/filesystem_sync.py` around lines 137 -
141, The current serialization of ToolGuard via guard.model_dump() writes
sensitive unsandboxed Python in the policy_code field to disk
(frontmatter['tool_guards']), so change the serialization to exclude or redact
policy_code before writing .cuga/tool_guides; e.g., replace guard.model_dump()
with a safe serialization that either calls
guard.model_dump(exclude={'policy_code'}) or uses a helper like
ToolGuard.to_safe_dict()/guard.safe_dict() that omits or redacts the policy_code
and any other sensitive fields, ensuring only non-executable metadata is written
to frontmatter['tool_guards'] while preserving existing keys (tool_guards,
prepend).

src/cuga/backend/tools_env/registry/registry/api_registry.py (1)

232-232: Simplify redundant unwrap.

Inside the conditional, 'params' in arguments is already true, so the .get('params', arguments) default is dead.

♻️ Cleanup

-        unwrapped_args = arguments.get('params', arguments) if isinstance(arguments, dict) and 'params' in arguments else arguments
+        unwrapped_args = (
+            arguments['params']
+            if isinstance(arguments, dict) and 'params' in arguments
+            else arguments
+        )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cuga/backend/tools_env/registry/registry/api_registry.py` at line 232,
The conditional that computes unwrapped_args uses arguments.get('params',
arguments) redundantly because the branch already ensures 'params' is in
arguments; update the expression to directly access arguments['params'] when
isinstance(arguments, dict) and 'params' in arguments, e.g. replace the current
ternary with arguments['params'] in that branch so unwrapped_args is set
deterministically; locate the assignment to unwrapped_args in api_registry.py to
make this change.

src/cuga/backend/tools_env/registry/registry/api_registry_server.py (1)

143-160: Extract the policy-storage bootstrap into a single helper.

The if settings.policy.enabled: ... PolicyStorage().connect() ... block is duplicated 3 times across _get_or_create_registry, lifespan, and reload_config (and would need to be touched in lockstep if you adopt the fail-closed fix above). A small async def _maybe_connect_policy_storage(fail_closed: bool) -> Optional[PolicyStorage] helper would centralize the lifecycle and policy.

Also applies to: 186-198, 564-576
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cuga/backend/tools_env/registry/registry/api_registry_server.py` around
lines 143 - 160, The policy storage bootstrap sequence (checking
settings.policy.enabled, instantiating PolicyStorage, calling connect(), logging
success/failure and optionally failing closed) is duplicated across
_get_or_create_registry, lifespan, and reload_config; refactor by extracting
that logic into a single async helper named
_maybe_connect_policy_storage(fail_closed: bool) -> Optional[PolicyStorage] that
encapsulates the settings.policy.enabled check, instantiates PolicyStorage,
calls await policy_storage.connect(), logs success with logger.info, logs
failures with logger.error, and either returns None or raises RuntimeError when
fail_closed is True; update callers (in _get_or_create_registry, lifespan,
reload_config and where ApiRegistry(client=manager, policy_storage=...) is
created) to call this helper and pass its return value into ApiRegistry.

src/cuga/backend/cuga_graph/policy/utils.py (1)

471-473: Subtle semantic change: missing triggers no longer errors.

Previously, missing triggers would have raised AttributeError (acting as a hard failure for non-ToolApproval policies). The new condition if policy_triggers is not None and not policy_triggers only flags the explicit empty-list case. This is the intended fix for ToolApproval (which has no triggers field), but it now silently skips the "at least one trigger" check whenever getattr returns None for any policy type. Since Playbook/IntentGuard/ToolGuide/OutputFormatter/CustomPolicy all declare triggers: List[Trigger] = Field(...) (required at construction), this should be fine in practice — but consider explicitly skipping the trigger validation only for ToolApproval to make the intent clearer.
♻️ More explicit alternative
-    policy_triggers = getattr(policy, "triggers", None)
-    if policy_triggers is not None and not policy_triggers:
-        errors.append("At least one trigger is required")
+    if not isinstance(policy, ToolApproval):
+        policy_triggers = getattr(policy, "triggers", None)
+        if not policy_triggers:
+            errors.append("At least one trigger is required")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cuga/backend/cuga_graph/policy/utils.py` around lines 471 - 473, The
current check uses policy_triggers = getattr(policy, "triggers", None) and only
errors on an explicit empty list, which unintentionally skips validation for
policies that should require triggers; modify the condition to explicitly skip
the validation only for ToolApproval (or alternately require the attribute
exist) — e.g., use isinstance(policy, ToolApproval) to bypass trigger checks for
ToolApproval, otherwise assert that getattr(policy, "triggers", None) is not
None and not empty and append "At least one trigger is required" when missing or
empty; update the check around policy_triggers, getattr(policy, "triggers",
None), and references to
ToolApproval/Playbook/IntentGuard/ToolGuide/OutputFormatter/CustomPolicy so
intent is explicit.

src/cuga/sdk_core/debug_tool_guard_e2e.py (1)

144-205: Drop redundant f prefixes (Ruff F541).

Several print statements use f"..." without any placeholders. Minor cleanup to silence the lint warnings.

♻️ Quick fix

-        print(f"Creating policy...")
+        print("Creating policy...")
...
-        print(f"Generating examples...")
+        print("Generating examples...")
...
-        print(f"Generating guard code...")
+        print("Generating guard code...")
...
-        print(f"✅ Policy saved successfully")
+        print("✅ Policy saved successfully")

As per the static analysis hints (Ruff F541 at lines 144, 154, 175, 205).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cuga/sdk_core/debug_tool_guard_e2e.py` around lines 144 - 205, Remove the
unnecessary f-string prefixes on the static print calls to fix Ruff F541:
replace print(f"Creating policy..."), print(f"Generating examples..."),
print(f"Generating guard code..."), and print(f"✅ Policy saved successfully")
with regular string literals (print("...")) and leave the other f-strings that
interpolate variables (e.g., print(f"✅ Created policy with ID: {policy_id}"),
print(f"✅ Generated {len(violating_examples)} ..."), print(f"✅ Generated guard
code ({len(guard_code)} characters)")) unchanged.

src/cuga/sdk.py (1)

539-575: Return type annotation inconsistent with None return.

The signature declares -> str, but Line 575 returns None when the policy system is disabled. This pattern matches several other add_* methods in this file, but consider tightening to Optional[str] for accuracy. New callers using strict typing (e.g., mypy/pyright) will see false-positives.
♻️ Proposed annotation
-    ) -> str:
+    ) -> Optional[str]:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cuga/sdk.py` around lines 539 - 575, The function update_tool_guard
currently declares a return type of str but returns None when the policy system
is disabled; change the signature to return Optional[str] and update any related
docstring Returns section accordingly; ensure other early-exit paths in
update_tool_guard (and similar methods like the add_* helpers in this module)
consistently return Optional[str] so type checkers (mypy/pyright) won't raise
errors.

src/cuga/backend/cuga_graph/policy/tool_guard/manager.py (1)

18-21: Module-level import of CugaAgent creates a fragile circular dependency.

cuga.sdk re-imports ToolGuardManager from this module (lines 1280, 1375 in src/cuga/sdk.py), and this module imports CugaAgent from cuga.sdk at module-load time. It works today only because sdk.py defers the ToolGuardManager import inside method bodies. If anyone imports cuga.backend.cuga_graph.policy.tool_guard.manager before cuga.sdk finishes loading, the import of CugaAgent will fail.

Consider deferring the CugaAgent import using TYPE_CHECKING for the import and a string-form forward reference for the type annotation to break the cycle.
♻️ Proposed refactor

Update the typing import on line 11 to include TYPE_CHECKING:
-from typing import List, Tuple, Dict, Any, Iterator, Optional
+from typing import List, Tuple, Dict, Any, Iterator, Optional, TYPE_CHECKING
Then update lines 18-21:
-from cuga.sdk import CugaAgent
 from toolguard.buildtime.buildtime import ToolGuardSpec, generate_guard_examples, generate_guards_code
 from toolguard.extra.langchain_to_oas import langchain_tools_to_openapi
 from toolguard.runtime.data_types import ToolGuardSpecItem, ToolGuardsCodeGenerationResult
+
+if TYPE_CHECKING:
+    from cuga.sdk import CugaAgent
And update the __init__ parameter annotation on line 28:
     def __init__(
         self,
-        agent: CugaAgent,
+        agent: "CugaAgent",
     ):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cuga/backend/cuga_graph/policy/tool_guard/manager.py` around lines 18 -
21, The module-level import of CugaAgent introduces a circular dependency;
change the imports to add TYPE_CHECKING from typing, move the CugaAgent import
into an if TYPE_CHECKING: block (i.e., if TYPE_CHECKING: from cuga.sdk import
CugaAgent) and remove the top-level from cuga.sdk import CugaAgent, then update
the ToolGuardManager __init__ (or whichever constructor uses CugaAgent)
parameter annotation to use a string forward reference ('CugaAgent') instead of
the direct type so runtime import is deferred.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/cuga/backend/cuga_graph/policy/tool_guard/cuga_llm_adapter.py`:
- Around line 178-187: The current response-content handling in cuga_llm_adapter
(the block that checks hasattr(response, 'content') and then joins list items)
is lossy for multimodal/list-of-dicts payloads; update that logic to, when
content is a list, iterate items and for each item if it's a dict extract
text-like fields (e.g., 'text', 'content', or nested {'type':'text','text':...})
prefer the 'text' value, otherwise skip non-text types (images, urls) or convert
primitives to str, then join the extracted text parts with newlines; keep the
existing fallback that casts non-list content to str.

In `@src/cuga/backend/cuga_graph/policy/tool_guard/tool_guard_runtime.py`:
- Line 217: The code uses Path.cwd() to compute domain_dir which couples
ToolGuardManager to the process CWD; change ToolGuardManager to accept an
explicit domain_dir parameter in its __init__ (or read from
settings.policy.toolguard_domain_dir if None) and replace the inline Path.cwd()
/ ".cuga" / "toolguard" / "domain" with that deterministic path; update the
construction sites of ToolGuardManager to pass the configured domain_dir (or
ensure settings.policy.toolguard_domain_dir is set) so both writer and reader
resolve the same domain files and avoid silent RuntimeError.
- Around line 570-576: The current dynamic class creation type("ToolGuardArgs",
(), arguments)() loses any argument keys that are not valid Python identifiers
and also causes collisions when a tool parameter is literally named "args"; to
fix: replace the type()-based approach in tool_guard_runtime.py (where args_obj
is created and passed into self._runtime.guard_toolcall) with a robust wrapper
that (1) preserves the original arguments dict unmodified (e.g., keep a .raw or
.all_args dict on the wrapper) and exposes only safe identifier attributes for
keys that pass str.isidentifier(), and (2) stop injecting into the call under
the literal "args" key—use a non-colliding reserved key such as "__guard_args__"
or "_guard_args" when merging with arguments; also update the guard invocation
site (self._runtime.guard_toolcall and any code consuming args_obj) to read from
the wrapper’s .raw for arbitrary keys and from attributes for identifier-safe
keys, and adjust api_registry.py error handling (the try/except around guard
validation) to not silently swallow guard exceptions—either re-raise or treat
guard failures as closed (deny) so guards don’t fail open.

In `@src/cuga/backend/tools_env/registry/registry/api_registry_server.py`:
- Around line 186-198: The YAML lifespan and reload_config paths currently catch
exceptions when creating PolicyStorage and only log a warning, allowing
policy_storage to be None and disabling ToolGuardRuntime; match the fail-closed
behavior of _get_or_create_registry by changing both spots where PolicyStorage()
is constructed and connect() is awaited (the lifespan block that sets
policy_storage and the reload_config routine) so that on any Exception you log
the failure and then raise a RuntimeError (with the error details) instead of
swallowing it—this ensures ApiRegistry (and its
_initialize_tool_guard_runtime/ToolGuardRuntime behavior) cannot start or reload
with policy.enabled=True and a missing policy storage.

In `@src/cuga/backend/tools_env/registry/registry/api_registry.py`:
- Around line 235-261: The wrapper in api_registry.py around
self.tool_guard_runtime.guard_tool_call currently catches all Exceptions and
lets the tool proceed; change it to honor ToolGuardRuntime.guard_tool_call's
fail-closed contract by treating exceptions from guard_tool_call as policy
violations (i.e., return the same blocking response structure with status
"exception", status_code 403, error_type "ToolGuardViolation" and a message that
includes the exception text and function_name) rather than continuing execution,
or alternatively narrow the except to only specific, well-justified recoverable
exceptions; locate the guard call in api_registry.py (the block invoking
guard_tool_call) and implement the fail-closed behavior consistent with
ToolGuardRuntime.guard_tool_call.

In `@src/cuga/sdk.py`:
- Around line 1410-1411: Remove the unreachable second return that references
undefined variables: inside the function that currently ends with "return
guard_code" followed by "return violating_examples, compliance_examples", delete
the latter unreachable return statement so only "return guard_code" remains;
this removes the dead code and the potential NameError flagged by Ruff F821
while preserving the intended returned value from the function.
- Around line 1358-1364: The docstring example calls policies.update_tool_guard
with the wrong signature; update_tool_guard only accepts (policy_id,
tool_guards: Dict[str, Dict[str, Any]]). Change the example call to pass a
tool_guards mapping instead of separate keyword args — e.g. call
policies.update_tool_guard(policy_id, {"delete_file": {"violating_examples":
violating, "compliance_examples": compliance}}) so the example matches the
update_tool_guard implementation.

---

Outside diff comments:
In `@src/cuga/backend/server/main.py`:
- Around line 2364-2368: The GET /api/config/policies handler currently uses
require_auth and returns ToolGuard.policy_code in
frontend_policy["tool_guards"], leaking executable policy code; change the
handler to either call require_manage_access instead of require_auth or, if you
want read access for non-manage users, iterate over each ToolGuard in the
response and redact/mask the policy_code (e.g., replace with None or
"<redacted>") when the current user does not have manage permission; reference
the GET handler that builds frontend_policy["tool_guards"], the require_auth /
require_manage_access decorators, and the ToolGuard.policy_code field when
implementing the check/masking (mirror how secrets routes hide values).

---

Nitpick comments:
In `@src/cuga/backend/cuga_graph/policy/filesystem_sync.py`:
- Around line 92-94: The code currently falls back to str(policy_type) which
yields "None" when both policy.type and policy.policy_type are missing; update
the logic around policy_type and policy_type_value so that if policy_type is
None you fail loudly (e.g., raise a ValueError with a clear message including
the policy identity or log a warning and skip writing) instead of writing the
string "None"; specifically change the block that computes policy_type and
policy_type_value (the variables policy_type and policy_type_value in
filesystem_sync.py) to detect None and raise/log, otherwise continue to extract
.value or str(...) as before so invalid/missing types are not written into YAML
frontmatter.
- Around line 137-141: The current serialization of ToolGuard via
guard.model_dump() writes sensitive unsandboxed Python in the policy_code field
to disk (frontmatter['tool_guards']), so change the serialization to exclude or
redact policy_code before writing .cuga/tool_guides; e.g., replace
guard.model_dump() with a safe serialization that either calls
guard.model_dump(exclude={'policy_code'}) or uses a helper like
ToolGuard.to_safe_dict()/guard.safe_dict() that omits or redacts the policy_code
and any other sensitive fields, ensuring only non-executable metadata is written
to frontmatter['tool_guards'] while preserving existing keys (tool_guards,
prepend).

In `@src/cuga/backend/cuga_graph/policy/folder_loader.py`:
- Around line 215-220: The tool guard parser in folder_loader.py currently calls
ToolGuard(**guard_config) for all entries and will raise a
pydantic.ValidationError that aborts the whole file without identifying which
tool failed; update the block that builds tool_guards (the code using
raw_tool_guards and ToolGuard) to (1) if raw_tool_guards is present but not a
dict, log a warning mentioning the file and raw_tool_guards type and skip it,
and (2) iterate raw_tool_guards.items() and construct each ToolGuard inside a
try/except that catches ValidationError (or Exception), logs an error including
the offending tool_name and the validation error details, and continues building
the rest so one bad tool entry does not break the entire ToolGuard set used by
load_policies_from_folder.

In `@src/cuga/backend/cuga_graph/policy/tool_guard/manager.py`:
- Around line 18-21: The module-level import of CugaAgent introduces a circular
dependency; change the imports to add TYPE_CHECKING from typing, move the
CugaAgent import into an if TYPE_CHECKING: block (i.e., if TYPE_CHECKING: from
cuga.sdk import CugaAgent) and remove the top-level from cuga.sdk import
CugaAgent, then update the ToolGuardManager __init__ (or whichever constructor
uses CugaAgent) parameter annotation to use a string forward reference
('CugaAgent') instead of the direct type so runtime import is deferred.

In `@src/cuga/backend/cuga_graph/policy/tool_guard/tool_invoker.py`:
- Around line 41-52: The _get_tools method can race on first initialization; add
an asyncio.Lock (e.g., self._tools_lock initialized in the class __init__) and
use it to guard the cache population: check self._tools_cache first, if None
then async with self._tools_lock and re-check self._tools_cache inside the lock,
then call await self.tool_provider.get_all_tools() and set self._tools_cache;
this prevents duplicate expensive get_all_tools() calls while preserving the
existing behavior of _get_tools and invoke.
- Around line 108-112: In the except block in tool_invoker.py (the "except
Exception as e" handling around the tool invocation that logs with logger and
raises a RuntimeError for toolname), change the f-string to use the !s
conversion flag (f"Failed to invoke tool '{toolname}': {e!s}") and stop
unconditionally wrapping every caught exception as RuntimeError; instead
re-raise ValueError or other toolguard-specific exceptions as-is (or re-raise e)
so callers can match by type, and only wrap into a RuntimeError when you must
add context (preserving the original via "raise RuntimeError(... ) from e" if
used).

In `@src/cuga/backend/cuga_graph/policy/utils.py`:
- Around line 471-473: The current check uses policy_triggers = getattr(policy,
"triggers", None) and only errors on an explicit empty list, which
unintentionally skips validation for policies that should require triggers;
modify the condition to explicitly skip the validation only for ToolApproval (or
alternately require the attribute exist) — e.g., use isinstance(policy,
ToolApproval) to bypass trigger checks for ToolApproval, otherwise assert that
getattr(policy, "triggers", None) is not None and not empty and append "At least
one trigger is required" when missing or empty; update the check around
policy_triggers, getattr(policy, "triggers", None), and references to
ToolApproval/Playbook/IntentGuard/ToolGuide/OutputFormatter/CustomPolicy so
intent is explicit.

In `@src/cuga/backend/tools_env/registry/registry/api_registry_server.py`:
- Around line 143-160: The policy storage bootstrap sequence (checking
settings.policy.enabled, instantiating PolicyStorage, calling connect(), logging
success/failure and optionally failing closed) is duplicated across
_get_or_create_registry, lifespan, and reload_config; refactor by extracting
that logic into a single async helper named
_maybe_connect_policy_storage(fail_closed: bool) -> Optional[PolicyStorage] that
encapsulates the settings.policy.enabled check, instantiates PolicyStorage,
calls await policy_storage.connect(), logs success with logger.info, logs
failures with logger.error, and either returns None or raises RuntimeError when
fail_closed is True; update callers (in _get_or_create_registry, lifespan,
reload_config and where ApiRegistry(client=manager, policy_storage=...) is
created) to call this helper and pass its return value into ApiRegistry.

In `@src/cuga/backend/tools_env/registry/registry/api_registry.py`:
- Line 232: The conditional that computes unwrapped_args uses
arguments.get('params', arguments) redundantly because the branch already
ensures 'params' is in arguments; update the expression to directly access
arguments['params'] when isinstance(arguments, dict) and 'params' in arguments,
e.g. replace the current ternary with arguments['params'] in that branch so
unwrapped_args is set deterministically; locate the assignment to unwrapped_args
in api_registry.py to make this change.

In `@src/cuga/sdk_core/debug_tool_guard_e2e.py`:
- Around line 144-205: Remove the unnecessary f-string prefixes on the static
print calls to fix Ruff F541: replace print(f"Creating policy..."),
print(f"Generating examples..."), print(f"Generating guard code..."), and
print(f"✅ Policy saved successfully") with regular string literals
(print("...")) and leave the other f-strings that interpolate variables (e.g.,
print(f"✅ Created policy with ID: {policy_id}"), print(f"✅ Generated
{len(violating_examples)} ..."), print(f"✅ Generated guard code
({len(guard_code)} characters)")) unchanged.

In `@src/cuga/sdk.py`:
- Around line 539-575: The function update_tool_guard currently declares a
return type of str but returns None when the policy system is disabled; change
the signature to return Optional[str] and update any related docstring Returns
section accordingly; ensure other early-exit paths in update_tool_guard (and
similar methods like the add_* helpers in this module) consistently return
Optional[str] so type checkers (mypy/pyright) won't raise errors.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 260888de-4fd5-4c35-9440-ff05c3026138

📥 Commits

Reviewing files that changed from the base of the PR and between a26a8f6 and 49c3d9e.

📒 Files selected for processing (19)

pyproject.toml
src/cuga/backend/cuga_graph/policy/__init__.py
src/cuga/backend/cuga_graph/policy/filesystem_sync.py
src/cuga/backend/cuga_graph/policy/folder_loader.py
src/cuga/backend/cuga_graph/policy/models.py
src/cuga/backend/cuga_graph/policy/tests/test_filesystem_sync.py
src/cuga/backend/cuga_graph/policy/tool_guard/__init__.py
src/cuga/backend/cuga_graph/policy/tool_guard/cuga_llm_adapter.py
src/cuga/backend/cuga_graph/policy/tool_guard/manager.py
src/cuga/backend/cuga_graph/policy/tool_guard/tool_guard_runtime.py
src/cuga/backend/cuga_graph/policy/tool_guard/tool_invoker.py
src/cuga/backend/cuga_graph/policy/utils.py
src/cuga/backend/server/main.py
src/cuga/backend/tools_env/registry/registry/api_registry.py
src/cuga/backend/tools_env/registry/registry/api_registry_server.py
src/cuga/sdk.py
src/cuga/sdk_core/debug_tool_guard_e2e.py
src/cuga/sdk_core/tests/policies-export-2025-12-31.json
src/cuga/sdk_core/tests/test_sdk_policies.py

💤 Files with no reviewable changes (1)

src/cuga/sdk_core/tests/policies-export-2025-12-31.json

coderabbitai · 2026-04-27T11:55:25Z

+        Raises:
+            RuntimeError: If domain directory or files are not found
+        """
+        domain_dir = Path.cwd() / ".cuga" / "toolguard" / "domain"


⚠️ Potential issue | 🟡 Minor

Path.cwd() couples runtime to process working directory.

Domain files are resolved relative to whatever directory the registry process happens to be started from. If ToolGuardManager (the writer) and the registry (the reader) run with different CWDs — common in containerized or systemd setups — domain loading silently raises RuntimeError. Consider plumbing this through settings or accepting an explicit domain_dir parameter so it's deterministic.

♻️ Suggested approach

- def _load_runtime_domain(self) -> RuntimeDomain: + def _load_runtime_domain(self, domain_dir: Optional[Path] = None) -> RuntimeDomain: ... - domain_dir = Path.cwd() / ".cuga" / "toolguard" / "domain" + domain_dir = domain_dir or (Path.cwd() / ".cuga" / "toolguard" / "domain")

Then accept the path through __init__ (or read from settings.policy.toolguard_domain_dir) so callers can override.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/cuga/backend/cuga_graph/policy/tool_guard/tool_guard_runtime.py` at line 217, The code uses Path.cwd() to compute domain_dir which couples ToolGuardManager to the process CWD; change ToolGuardManager to accept an explicit domain_dir parameter in its __init__ (or read from settings.policy.toolguard_domain_dir if None) and replace the inline Path.cwd() / ".cuga" / "toolguard" / "domain" with that deterministic path; update the construction sites of ToolGuardManager to pass the configured domain_dir (or ensure settings.policy.toolguard_domain_dir is set) so both writer and reader resolve the same domain files and avoid silent RuntimeError.

coderabbitai · 2026-04-27T11:55:25Z

+        try:
+            args_obj = type("ToolGuardArgs", (), arguments)()
+            await self._runtime.guard_toolcall(
+                tool_name=function_name,
+                args=arguments | {"args": args_obj},
+                delegate=self.invoker,
+            )


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

toolguard runtime guard_toolcall args parameter expected shape

💡 Result:

The guard_toolcall method in ToolGuard's runtime module (from the AgentToolkit/toolguard Python package) expects three positional arguments: 1. tool_name (str): The name of the tool being called (e.g., "add_tool", "divide_tool"). 2. args (dict): The tool arguments as a dictionary of parameter values (e.g., {"a": 5, "b": 4} or {"g": 10, "h": 2}). 3. invoker: An invoker object compatible with the tools, such as ToolFunctionsInvoker, ToolMethodsInvoker, or LangchainToolInvoker. For LangChain tools, args must be nested under an "args" key: {"args": {"g": 5, "h": 2}}. The method is asynchronous (await toolguard.guard_toolcall(...)) and raises PolicyViolationException if the call violates generated policy guards. It is used after loading guards via load_toolguards(path). Example usage: invoker = ToolFunctionsInvoker([divide_tool, add_tool, multiply_tool]) await toolguard.guard_toolcall("add_tool", {"a": 5, "b": 4}, invoker) For LangChain: invoker = LangchainToolInvoker(tools) await toolguard.guard_toolcall("divide_tool", {"args": {"g": 5, "h": 2}}, invoker)

Citations:

1: https://github.com/AgentToolkit/toolguard

2: https://pypi.org/project/toolguard/0.1.19/

3: https://pypi.org/project/toolguard/

🏁 Script executed:

#!/bin/bash # Read the actual code at lines 570-576 sed -n '570,576p' src/cuga/backend/cuga_graph/policy/tool_guard/tool_guard_runtime.py echo "---" # Also read some context around it (560-580) sed -n '560,580p' src/cuga/backend/cuga_graph/policy/tool_guard/tool_guard_runtime.py

Repository: cuga-project/cuga-agent

Length of output: 1071

🏁 Script executed:

#!/bin/bash # Read api_registry.py lines 257-260 to see error handling sed -n '257,260p' src/cuga/backend/cuga_graph/policy/tool_guard/api_registry.py echo "---" # Also read context around it (250-270) sed -n '250,270p' src/cuga/backend/cuga_graph/policy/tool_guard/api_registry.py

Repository: cuga-project/cuga-agent

Length of output: 299

🏁 Script executed:

#!/bin/bash # Search for usage of tool arguments to see if non-identifier keys are actually used rg "user-id|user\.id|user-name" src/ -i --type py -A 2 -B 2 | head -50

Repository: cuga-project/cuga-agent

Length of output: 772

🏁 Script executed:

#!/bin/bash # Check if there are any test cases or examples showing argument names with special characters fd -e py -x grep -l "guard_toolcall\|ToolGuardArgs" {} \; | head -10

Repository: cuga-project/cuga-agent

Length of output: 137

🏁 Script executed:

#!/bin/bash # Find api_registry.py fd -name "api_registry.py" src/

Repository: cuga-project/cuga-agent

Length of output: 298

🏁 Script executed:

#!/bin/bash # Search for error handling around guard_toolcall to find the wrapper mentioned rg "guard_toolcall" src/ -B 3 -A 3 --type py | head -80

Repository: cuga-project/cuga-agent

Length of output: 800

🏁 Script executed:

#!/bin/bash # Verify what happens with type() when dict keys are not valid identifiers python3 << 'EOF' # Test 1: Non-identifier keys test_args = {"user-id": "123", "valid_key": "value"} try: args_obj = type("ToolGuardArgs", (), test_args)() print(f"Test 1 passed: {args_obj.__dict__}") # Try to access via attribute try: print(f"Access user-id: {args_obj.__dict__}") except: print("Cannot access user-id as attribute") except TypeError as e: print(f"Test 1 FAILED with TypeError: {e}") print("---") # Test 2: "args" collision test_args2 = {"a": 1, "b": 2, "args": "user_provided"} args_obj2 = type("ToolGuardArgs", (), test_args2)() merged = test_args2 | {"args": args_obj2} print(f"Test 2 - merged dict: {merged}") print(f"Test 2 - 'args' value type: {type(merged['args'])}") print(f"Original args value overwritten: {merged['args'] != 'user_provided'}") EOF

Repository: cuga-project/cuga-agent

Length of output: 305

🏁 Script executed:

#!/bin/bash # Find api_registry references or wrapper around guard_toolcall rg "api_registry|257|260" src/ --type py -B 2 -A 2 | head -100

Repository: cuga-project/cuga-agent

Length of output: 7490

🏁 Script executed:

#!/bin/bash # Search for files that import or use guard_toolcall rg "from.*tool_guard\|import.*guard" src/ --type py | grep -v ".pyc"

Repository: cuga-project/cuga-agent

Length of output: 49

🏁 Script executed:

#!/bin/bash # Look for exception handling around guard_toolcall that is NOT PolicyViolationException rg "guard_toolcall" src/ --type py -B 10 | grep -E "except|TypeError|Exception" -A 2 -B 2

Repository: cuga-project/cuga-agent

Length of output: 49

🏁 Script executed:

#!/bin/bash # Find the correct api_registry.py file fd api_registry.py src/

Repository: cuga-project/cuga-agent

Length of output: 195

🏁 Script executed:

#!/bin/bash # Read lines 257-260 from the correct api_registry.py sed -n '257,260p' src/cuga/backend/tools_env/registry/registry/api_registry.py echo "---" # Context around it sed -n '250,270p' src/cuga/backend/tools_env/registry/registry/api_registry.py

Repository: cuga-project/cuga-agent

Length of output: 1546

🏁 Script executed:

#!/bin/bash # Get more context around the guard execution to understand the flow sed -n '240,290p' src/cuga/backend/tools_env/registry/registry/api_registry.py

Repository: cuga-project/cuga-agent

Length of output: 2451

🏁 Script executed:

#!/bin/bash # Verify that guard_toolcall is called before this exception handler rg "guard_toolcall" src/cuga/backend/tools_env/registry/registry/api_registry.py -B 5 -A 1

Repository: cuga-project/cuga-agent

Length of output: 49

🏁 Script executed:

#!/bin/bash # Search for the guard validation method call in api_registry.py rg "validate_tool_call|guard" src/cuga/backend/tools_env/registry/registry/api_registry.py -B 2 -A 2

Repository: cuga-project/cuga-agent

Length of output: 3762

🏁 Script executed:

#!/bin/bash # Check the complete guard validation flow in tool_guard_runtime.py wc -l src/cuga/backend/cuga_graph/policy/tool_guard/tool_guard_runtime.py

Repository: cuga-project/cuga-agent

Length of output: 139

🏁 Script executed:

#!/bin/bash # Look at the validate_tool_call method to understand the full picture sed -n '550,600p' src/cuga/backend/cuga_graph/policy/tool_guard/tool_guard_runtime.py

Repository: cuga-project/cuga-agent

Length of output: 1915

Dynamic class creation silently drops non-identifier argument keys and collides with "args" parameter.

At line 571, type("ToolGuardArgs", (), arguments)() uses arguments as the class namespace. While type() does not raise an error on non-identifier keys (e.g., "user-id", "user.name"), these keys are silently dropped from the resulting class's __dict__, making them inaccessible as attributes in guard code. Additionally, line 574's arguments | {"args": args_obj} will overwrite any user parameter literally named "args".

Two issues:

Silent data loss for argument keys that aren't valid Python identifiers—they vanish from args_obj attributes.

Parameter collision for any tool with a parameter named args, which gets replaced by the injected args object.

Error handling in api_registry.py lines 257–260 catches all exceptions during guard validation and allows the tool call to proceed without guards, so guard errors fail open rather than closed.

🛡️ Proposed fix — validate keys and avoid collision

- args_obj = type("ToolGuardArgs", (), arguments)() - await self._runtime.guard_toolcall( - tool_name=function_name, - args=arguments | {"args": args_obj}, - delegate=self.invoker, - ) + # Build a namespace using only valid Python identifier keys so type() + # doesn't create a class with unusable attributes. + ns = {k: v for k, v in arguments.items() if isinstance(k, str) and k.isidentifier()} + args_obj = type("ToolGuardArgs", (), ns)() + merged = {**arguments, "args": args_obj} + if "args" in arguments: + logger.warning( + f"Tool '{function_name}' has a parameter named 'args' which " + "collides with the ToolGuard injected args object." + ) + await self._runtime.guard_toolcall( + tool_name=function_name, + args=merged, + delegate=self.invoker, + )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/cuga/backend/cuga_graph/policy/tool_guard/tool_guard_runtime.py` around lines 570 - 576, The current dynamic class creation type("ToolGuardArgs", (), arguments)() loses any argument keys that are not valid Python identifiers and also causes collisions when a tool parameter is literally named "args"; to fix: replace the type()-based approach in tool_guard_runtime.py (where args_obj is created and passed into self._runtime.guard_toolcall) with a robust wrapper that (1) preserves the original arguments dict unmodified (e.g., keep a .raw or .all_args dict on the wrapper) and exposes only safe identifier attributes for keys that pass str.isidentifier(), and (2) stop injecting into the call under the literal "args" key—use a non-colliding reserved key such as "__guard_args__" or "_guard_args" when merging with arguments; also update the guard invocation site (self._runtime.guard_toolcall and any code consuming args_obj) to read from the wrapper’s .raw for arbitrary keys and from attributes for identifier-safe keys, and adjust api_registry.py error handling (the try/except around guard validation) to not silently swallow guard exceptions—either re-raise or treat guard failures as closed (deny) so guards don’t fail open.

can u address this

🧩 Analysis chain

@sami-marreed, here's the plan to address all three issues flagged in the review:

Task: Fix arg key handling, "args" collision, and fail-open guard behavior

Fix guard_tool_call in tool_guard_runtime.py to filter non-identifier keys and avoid "args" collision

Fix fail-open error handling in api_registry.py so guard exceptions block the call

🏁 Script executed:

#!/bin/bash # Read the full guard_tool_call method to get current context sed -n '530,620p' src/cuga/backend/cuga_graph/policy/tool_guard/tool_guard_runtime.py

Length of output: 3151

🏁 Script executed:

#!/bin/bash # Read the guard validation block with context sed -n '240,290p' src/cuga/backend/tools_env/registry/registry/api_registry.py

Length of output: 2384

@sami-marreed, here are the two concrete fixes:

Fix 1 — src/cuga/backend/cuga_graph/policy/tool_guard/tool_guard_runtime.py

Filter non-identifier keys before building the attribute object, and use "__guard_args__" to avoid colliding with any tool parameter literally named "args":

try: # Build attribute object using only valid Python identifier keys so that # non-identifier argument names (e.g. "user-id") are not silently dropped. ns = { k: v for k, v in arguments.items() if isinstance(k, str) and k.isidentifier() } if len(ns) < len(arguments): dropped = [k for k in arguments if not (isinstance(k, str) and k.isidentifier())] logger.warning( f"Tool '{function_name}': {len(dropped)} argument key(s) are not valid " f"Python identifiers and will not be accessible as attributes: {dropped}" ) args_obj = type("ToolGuardArgs", (), ns)() # Use "__guard_args__" instead of "args" to avoid overwriting a legitimate # tool parameter that happens to be named "args". if "args" in arguments: logger.warning( f"Tool '{function_name}' has a parameter named 'args'; it is preserved " "in the arguments dict and the guard args object is injected as '__guard_args__'." ) merged = {**arguments, "__guard_args__": args_obj} await self._runtime.guard_toolcall( tool_name=function_name, args=merged, delegate=self.invoker, )

Fix 2 — src/cuga/backend/tools_env/registry/registry/api_registry.py

Change the fail-open except block to fail-closed by returning a 403, consistent with how a real violation is handled:

except Exception as e: # Fail closed: an unexpected guard error must not silently allow # the tool call to proceed, as that would let a broken/malformed # guard bypass policy enforcement entirely. logger.error( f"Unexpected error executing tool guard for '{function_name}': {e}", exc_info=True, ) return { "status": "exception", "status_code": 403, "message": ( f"Tool guard encountered an internal error for '{function_name}' " "and the call has been blocked as a safety precaution." ), "error_type": "ToolGuardInternalError", "function_name": function_name, }

Note on tool_guard_runtime.py's own except Exception block: The runtime's internal handler (lines ~582–590) already returns a blocking error string for unexpected exceptions, so it already fails closed — no change needed there.

🧠 Learnings used

Learnt from: haroldship Repo: cuga-project/cuga-agent PR: 67 File: src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py:377-397 Timestamp: 2026-03-23T08:37:20.455Z Learning: In `src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py`, when `PromptUtils.find_tools` raises an `OutputParserException` or other exception (i.e., the shortlister LLM returns malformed/empty JSON), the correct behaviour is to return a short user-facing error string to the agent — NOT a fallback tool list. This was an explicit design decision: returning all tools on LLM failure bloats the agent context and masks bugs; the error string lets the agent LLM decide whether to retry or proceed. This is distinct from the soft-filtering fallback (which applies when `filter_criteria` is provided but no tools match the criteria).

Learnt from: offerakrabi Repo: cuga-project/cuga-agent PR: 61 File: src/cuga/backend/cuga_graph/nodes/cuga_lite/combined_tool_provider.py:232-232 Timestamp: 2026-03-16T09:35:34.380Z Learning: In `src/cuga/backend/cuga_graph/nodes/cuga_lite/combined_tool_provider.py`, `settings.advanced_features.registry` access in `CombinedToolProvider.initialize()` can fail in CI test environments if Dynaconf validators haven't been applied yet. The validator with `default=True` is declared in `config.py` (line 117) and `validate_all()` is called at module level (line 264). The preferred defensive fix is `getattr(settings.advanced_features, 'registry', True)` consistent with patterns used elsewhere in the codebase (e.g., `helpers.py`).

Learnt from: sami-marreed Repo: cuga-project/cuga-agent PR: 121 File: src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py:229-247 Timestamp: 2026-04-05T20:51:54.199Z Learning: In `src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py`, `format_task_todos_system_block` intentionally injects todo items verbatim into the system prompt so the agent treats them as authoritative, high-priority task directives it must follow. This is a deliberate design choice — do NOT flag raw todo injection as a prompt-injection risk in future reviews. Todos are generated by the agent's own `create_update_todos` tool, not from raw user input, so the injection risk is considered acceptable by the team.

📌 Commit the changes to this branch

coderabbitai

Actionable comments posted: 7

♻️ Duplicate comments (2)

src/cuga/sdk.py (1)

1358-1364: ⚠️ Potential issue | 🟡 Minor

Fix the update_tool_guard docstring example.

The example still uses the old target_tool / violating_examples / compliance_examples signature, but update_tool_guard() now accepts a single tool_guards mapping. Copy-pasting this snippet will raise TypeError.

📝 Proposed docstring fix

             # Update policy with examples
             await agent.policies.update_tool_guard(
                 policy_id=policy_id,
-                target_tool="delete_file",
-                violating_examples=violating,
-                compliance_examples=compliance
+                tool_guards={
+                    "delete_file": {
+                        "description": "Guard rules for file deletion",
+                        "violating_examples": violating,
+                        "compliance_examples": compliance,
+                        "policy_code": "",
+                    }
+                }
             )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cuga/sdk.py` around lines 1358 - 1364, The docstring example for
agent.policies.update_tool_guard is using the old keyword args (target_tool,
violating_examples, compliance_examples) and will raise a TypeError; update the
example to call update_tool_guard with a single tool_guards mapping (e.g. call
agent.policies.update_tool_guard(policy_id=policy_id,
tool_guards={"delete_file": {"violating_examples": violating,
"compliance_examples": compliance}}) so the example matches the new signature
exposed by update_tool_guard.

src/cuga/backend/cuga_graph/policy/tool_guard/tool_guard_runtime.py (1)

570-575: ⚠️ Potential issue | 🟠 Major

Don't clobber a real args parameter.

arguments | {"args": args_obj} overwrites any user-supplied args field before validation, so those tools are guarded against synthetic data instead of the real payload. This is the same collision that was flagged earlier and it still exists here.

Verification: this should print overwritten: True.

#!/bin/bash
python - <<'PY'
from types import SimpleNamespace

arguments = {"args": "user-supplied", "value": 1}
args_obj = SimpleNamespace(**arguments)
merged = arguments | {"args": args_obj}

print("original args:", arguments["args"])
print("merged args type:", type(merged["args"]).__name__)
print("overwritten:", merged["args"] != arguments["args"])
PY

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cuga/backend/cuga_graph/policy/tool_guard/tool_guard_runtime.py` around
lines 570 - 575, The call to self._runtime.guard_toolcall currently forces
arguments | {"args": args_obj}, which clobbers any user-supplied "args" key;
change the merge so we do not overwrite an existing "args" field—either pass the
SimpleNamespace under a distinct key (e.g. "_args" or "args_obj") or only inject
"args" when "args" is not already present (check arguments for "args" before
merging); update the call site in the guard_toolcall invocation (the arguments
passed to self._runtime.guard_toolcall) to use the non-colliding key or
conditional merge while keeping function_name, args_obj/SimpleNamespace
creation, and delegate=self.invoker intact.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/cuga/backend/cuga_graph/policy/tool_guard/tool_guard_runtime.py`:
- Around line 497-526: The current _normalize_name is lossy and can map
different tool names to the same identifier; change the normalization strategy
used by _normalize_name (and therefore used by _guard_module_name_for_tool and
_guard_function_name_for_tool) to append a short deterministic disambiguator
derived from the original name (e.g., a truncated hash like sha1/sha256 of the
original name, 6–8 hex chars) after the safe normalized token, ensuring you keep
the readable normalized portion plus the suffix (and still default to "tool" if
empty); update any code that expects the previous plain names (notably
_guard_module_name_for_tool, _guard_function_name_for_tool, and places in
_build_runtime that generate/write module files) so they use the new stable
unique names and avoid overwriting/loading the wrong guard.

In `@src/cuga/backend/tools_env/registry/registry/api_registry_server.py`:
- Around line 218-219: When shutting down or reloading in YAML mode you must
also tear down the old registry's policy storage: in the same places where you
check database_mode and call await mcp_manager.shutdown() (the shutdown block
and the reload block that overwrites the global registry), detect if "registry"
is present in globals() and if registry.policy_storage exists and await its
teardown (preferably calling await registry.policy_storage.shutdown() and
falling back to await registry.policy_storage.close() if shutdown() is not
implemented), then remove/replace the global registry reference; this ensures
the YAML policy backend connection is closed on shutdown and reload.
- Around line 160-161: The ApiRegistry startup opens policy_storage then calls
await reg.start_servers(), but if start_servers (or subsequent tool
loading/ToolGuard init) raises the opened PolicyStorage is leaked; wrap the
ApiRegistry construction and startup in a try/except/finally: after creating
policy_storage and reg, attempt await reg.start_servers() inside try, on
exception ensure policy_storage.disconnect() (or close) is awaited before
re-raising the exception; apply the same pattern to the YAML-mode startup and
the reload paths so any early exceptions always close the newly opened
PolicyStorage.

In `@src/cuga/backend/tools_env/registry/registry/api_registry.py`:
- Around line 231-233: The call_function() logic is incorrectly unwrapping any
dict containing a 'params' key, which changes the public payload seen by guards
and tools (see ApiRegistry.call_function and the local variable unwrapped_args);
revert this by removing the unconditional unwrapping from
ApiRegistry.call_function and instead perform transport-wrapper normalization at
the request boundary/handler where the wrapper contract is explicit (e.g., the
HTTP/transport deserializer that receives the API payload), or add a strict
discriminator (a specific wrapper key or type tag) before unwrapping so only
true transport wrappers are normalized and real tool payloads with a params
field are left intact.
- Around line 87-110: The current except block swallows ToolGuardRuntime
initialization errors by setting self.tool_guard_runtime = None which lets
call_function() skip all policy validation; instead, make initialization failure
fatal: in the except for the ToolGuardRuntime
import/initialize/get_guarded_tools block, either re-raise the caught exception
or raise a new RuntimeError with context (e.g. "ToolGuardRuntime failed to
initialize") so service startup fails, and remove the fallback to None; ensure
callers (call_function()) rely on the presence of self.tool_guard_runtime rather
than treating None as "no guards" so policy validation cannot be silently
bypassed.

In `@src/cuga/sdk.py`:
- Around line 1230-1235: The exception branch now returns a dict with keys
"loaded", "removed", "errors", and "files", but the two early return paths in
the same function (the branches that short-circuit for disabled policy or
uninitialized filesystem sync) still omit "files"; update those early returns to
return the same stable payload shape — include "files": [] along with "loaded":
0, "removed": 0 and "errors": [] (or existing error value) so every return from
the function uses the same keys and types.
- Around line 592-598: The loop that builds tool_guards_obj accepts any keys
from tool_guards and can register invalid guard names; validate that each
tool_name exists in the target_tools set/list before creating a ToolGuard (e.g.,
compute invalid = set(tool_guards.keys()) - set(target_tools)), and if any
invalid names exist either raise a ValueError listing them or log and skip them;
update the population logic around tool_guards_obj and references to
tool_guards/target_tools/ToolGuard so typos are caught early and only valid tool
names are registered.

---

Duplicate comments:
In `@src/cuga/backend/cuga_graph/policy/tool_guard/tool_guard_runtime.py`:
- Around line 570-575: The call to self._runtime.guard_toolcall currently forces
arguments | {"args": args_obj}, which clobbers any user-supplied "args" key;
change the merge so we do not overwrite an existing "args" field—either pass the
SimpleNamespace under a distinct key (e.g. "_args" or "args_obj") or only inject
"args" when "args" is not already present (check arguments for "args" before
merging); update the call site in the guard_toolcall invocation (the arguments
passed to self._runtime.guard_toolcall) to use the non-colliding key or
conditional merge while keeping function_name, args_obj/SimpleNamespace
creation, and delegate=self.invoker intact.

In `@src/cuga/sdk.py`:
- Around line 1358-1364: The docstring example for
agent.policies.update_tool_guard is using the old keyword args (target_tool,
violating_examples, compliance_examples) and will raise a TypeError; update the
example to call update_tool_guard with a single tool_guards mapping (e.g. call
agent.policies.update_tool_guard(policy_id=policy_id,
tool_guards={"delete_file": {"violating_examples": violating,
"compliance_examples": compliance}}) so the example matches the new signature
exposed by update_tool_guard.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 41305a49-4d15-4e5a-a5bc-293593eae091

📥 Commits

Reviewing files that changed from the base of the PR and between 49c3d9e and 07778fa.

📒 Files selected for processing (4)

src/cuga/backend/cuga_graph/policy/tool_guard/tool_guard_runtime.py
src/cuga/backend/tools_env/registry/registry/api_registry.py
src/cuga/backend/tools_env/registry/registry/api_registry_server.py
src/cuga/sdk.py

coderabbitai

Actionable comments posted: 4

♻️ Duplicate comments (1)

src/cuga/sdk.py (1)

591-599: ⚠️ Potential issue | 🟠 Major

Honor wildcard ToolGuide targets here.

ToolGuide.target_tools explicitly allows ["*"], but Lines 591-599 treat "*" as a literal tool name. That makes update_tool_guard() unusable for wildcard policies, because every real tool key is rejected as invalid.

🛠️ Proposed fix

         # Validate that tool_guards keys are in target_tools
         target_tools_set = set(existing_policy.target_tools or [])
-        invalid_tools = set(tool_guards.keys()) - target_tools_set
+        wildcard = "*" in target_tools_set
+        invalid_tools = set() if wildcard else set(tool_guards.keys()) - target_tools_set

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cuga/sdk.py` around lines 591 - 599, The validation incorrectly treats a
"*" wildcard as a literal tool name; in the block that computes target_tools_set
from existing_policy.target_tools and then computes invalid_tools =
set(tool_guards.keys()) - target_tools_set, change the logic to honor the
wildcard: if existing_policy.target_tools contains "*" (e.g., target_tools_set
== {"*"} or "*" in existing_policy.target_tools) skip the subtraction/validation
or treat all tool_guards keys as allowed so invalid_tools is empty; adjust the
branch that raises ValueError accordingly so update_tool_guard() accepts
tool_guards when the policy target is a wildcard.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/cuga/backend/cuga_graph/policy/tool_guard/tool_guard_runtime.py`:
- Around line 81-83: The runtime currently builds a single global domain/runtime
using _load_runtime_domain and _build_runtime and uses it in guard_tool_call,
which ignores app_name and causes cross-app collisions; update the
implementation to maintain a mapping keyed by (app_name, tool_name) (or at
minimum per-app runtimes) instead of single self._runtime and
self._runtime_domain: change tool_to_guards lookups and guard_tool_call to
accept and use app_name when selecting the runtime/guards, add a cache like
self._runtimes_by_app (or self._runtimes_by_app_tool) that calls
_load_runtime_domain/_build_runtime per app (or per (app,tool)) when missing,
and update all places noted (including the blocks around lines 207-228 and
538-585) to read from this keyed mapping so each app uses its own
domain/runtime.
- Around line 123-137: The code currently registers a policy as soon as
policy.policy_code is non-empty which can leave a tool marked "guarded" even if
no async guard function exists; change the registration loop in
tool_guard_runtime (the block iterating policy.tool_guards and appending to
self.tool_to_guards) to validate the policy code by invoking the existing
_process_policy_guard() (or the same validation logic used by
_generate_module_content()) and ensure at least one async def guard_* callable
is present before appending policy to self.tool_to_guards; if validation fails,
do not register the policy for that tool and either log an error and skip or
raise an initialization error to fail-closed (choose consistent behavior used
elsewhere), and mirror this validation in the other similar registration sites
noted (around the other occurrences of the same registration logic).

In `@src/cuga/backend/tools_env/registry/registry/api_registry_server.py`:
- Around line 160-170: When startup aborts after ApiRegistry(client=manager) and
await reg.start_servers() fails, you currently only disconnect policy_storage;
also ensure the MCP manager(s) are shut down: after catching the exception,
await shutdown of the manager and any nested MCP managers (e.g., call await
manager.shutdown() if available, otherwise if manager has attributes mcp_manager
or new_manager call await manager.mcp_manager.shutdown()/await
manager.new_manager.shutdown() or their .disconnect()/.stop() equivalents),
wrapping each call in its own try/except and logging warnings similar to the
policy_storage cleanup; apply the same pattern for the other two failure
handlers referenced (the blocks around lines 213-222 and 611-621).

In `@src/cuga/backend/tools_env/registry/registry/api_registry.py`:
- Around line 239-246: Tool guards are being applied globally because
ToolGuardRuntime only indexes by function_name; update the guard lookup to also
respect the app scope by filtering guards against ToolGuide.target_apps (or an
empty/None meaning all apps) before enforcing them. Specifically, modify
ToolGuardRuntime.guard_tool_call (and any internal guard index/lookup) to
consider the incoming app_name parameter and only evaluate guards whose
ToolGuide.target_apps contains that app_name (or is unset), so a guard for
"delete" on one app does not block "delete" on other apps; ensure existing call
sites (e.g., api_registry.py calling tool_guard_runtime.guard_tool_call) remain
unchanged except for the corrected behavior.

---

Duplicate comments:
In `@src/cuga/sdk.py`:
- Around line 591-599: The validation incorrectly treats a "*" wildcard as a
literal tool name; in the block that computes target_tools_set from
existing_policy.target_tools and then computes invalid_tools =
set(tool_guards.keys()) - target_tools_set, change the logic to honor the
wildcard: if existing_policy.target_tools contains "*" (e.g., target_tools_set
== {"*"} or "*" in existing_policy.target_tools) skip the subtraction/validation
or treat all tool_guards keys as allowed so invalid_tools is empty; adjust the
branch that raises ValueError accordingly so update_tool_guard() accepts
tool_guards when the policy target is a wildcard.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 82ee4859-4bfb-47d0-ba9d-a1e9ca4a8838

📥 Commits

Reviewing files that changed from the base of the PR and between 07778fa and abebaae.

📒 Files selected for processing (4)

src/cuga/backend/cuga_graph/policy/tool_guard/tool_guard_runtime.py
src/cuga/backend/tools_env/registry/registry/api_registry.py
src/cuga/backend/tools_env/registry/registry/api_registry_server.py
src/cuga/sdk.py

coderabbitai

Actionable comments posted: 4

♻️ Duplicate comments (1)

src/cuga/backend/cuga_graph/policy/tool_guard/tool_guard_runtime.py (1)
698-703: ⚠️ Potential issue | 🟠 Major

Preserve a real tool argument named args.

arguments | {"args": args_obj} still overwrites any user-supplied arguments["args"], so guards can validate a different payload than the tool will actually receive.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cuga/backend/cuga_graph/policy/tool_guard/tool_guard_runtime.py` around
lines 698 - 703, The guard currently overwrites any user-supplied
arguments["args"] by passing arguments | {"args": args_obj} to
runtime.guard_toolcall; change this to preserve a real "args" when present
(e.g., if "args" in arguments then pass arguments unchanged, otherwise pass
arguments | {"args": args_obj}) so the guard validates the exact payload the
tool will receive; update the call to runtime.guard_toolcall (using
function_name, arguments, args_obj, self.invoker) accordingly.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/cuga/backend/cuga_graph/policy/tool_guard/tool_guard_runtime.py`:
- Around line 632-639: The code currently caches None on runtime initialization
failure in _runtimes_by_app (tool_guard_runtime.py), which causes
guard_tool_call() to treat that as "skip validation" and disables enforcement;
instead, change the failure path (both the except block around app runtime
bootstrap and the similar block at 684-691) to not store None but store a clear
failure sentinel (e.g., a unique RuntimeInitFailed object or an exception
instance) or re-raise the error so callers can distinguish failure from "not
initialized", and update guard_tool_call() to check _runtimes_by_app for that
sentinel/exception and treat it as a hard failure (deny or block the tool call)
rather than bypassing enforcement.
- Around line 616-628: Concurrent cold-starts can race because the snippet
checks self._runtimes_by_app, then builds and __enter__()s runtimes without
synchronization, causing duplicate expensive builds and leaked entered runtimes;
serialize initialization per app by introducing a per-app lock map (e.g.,
self._runtime_init_locks_by_app) and acquire the lock before re-checking the
cache and calling self._load_runtime_domain and self._build_runtime (and
__enter__()); ensure you create the lock lazily for the app, use a context
manager to release reliably, and only assign into self._runtimes_by_app and
self._runtime_domains_by_app after the build succeeds so only one built/entered
runtime is kept.

In `@src/cuga/backend/tools_env/registry/registry/api_registry_server.py`:
- Around line 239-256: The current cleanup sequence awaits mgr.shutdown() before
attempting reg.policy_storage.disconnect(), which aborts disconnects on shutdown
exceptions; create a shared async helper (e.g. async def
cleanup_registry_resources(agent_id, mgr, reg)) that performs both actions
independently and catches/logs exceptions for each (try/except around await
mgr.shutdown() and separately around await reg.policy_storage.disconnect(),
ensuring both are attempted regardless of the other failing), then replace the
inline shutdown+disconnect sequences in api_registry_server (the loop using
agent_registries and the other occurrences at the lifespan/reload/cache-clear
paths referenced in the comment) to call this helper for each registry and for
the global registry/mcp_manager cleanup so all resources are best-effort cleaned
up.
- Around line 623-649: After new_registry.start_servers() succeeds, the
subsequent old-state teardown can fail and currently does not roll back the
newly started resources; modify the post-start cleanup so any exception while
tearing down the old registry (e.g. during registry.policy_storage.disconnect()
or mcp_manager.shutdown()) triggers a rollback that cleanly shuts down the newly
created resources: call new_registry.shutdown() (or a stop/teardown method on
ApiRegistry if named differently), await new_manager.shutdown(), and if
policy_storage is not None await policy_storage.disconnect() inside an exception
handler that re-raises the original error; surround the old cleanup block
(registry.policy_storage.disconnect() and mcp_manager.shutdown()) with
try/except and perform this rollback in the except path to ensure
new_manager/new_registry/policy_storage are always torn down on failure.

---

Duplicate comments:
In `@src/cuga/backend/cuga_graph/policy/tool_guard/tool_guard_runtime.py`:
- Around line 698-703: The guard currently overwrites any user-supplied
arguments["args"] by passing arguments | {"args": args_obj} to
runtime.guard_toolcall; change this to preserve a real "args" when present
(e.g., if "args" in arguments then pass arguments unchanged, otherwise pass
arguments | {"args": args_obj}) so the guard validates the exact payload the
tool will receive; update the call to runtime.guard_toolcall (using
function_name, arguments, args_obj, self.invoker) accordingly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 013486f0-7d00-4b57-9b37-728215fa6162

📥 Commits

Reviewing files that changed from the base of the PR and between abebaae and aaa7b21.

📒 Files selected for processing (2)

src/cuga/backend/cuga_graph/policy/tool_guard/tool_guard_runtime.py
src/cuga/backend/tools_env/registry/registry/api_registry_server.py

coderabbitai · 2026-04-28T12:12:28Z

+        # Return cached runtime if available
+        if app_name in self._runtimes_by_app:
+            return self._runtimes_by_app[app_name]
+
+        # Try to load and build runtime for this app
+        try:
+            logger.info(f"Loading runtime domain for app '{app_name}'...")
+            runtime_domain = self._load_runtime_domain(app_name)
+            self._runtime_domains_by_app[app_name] = runtime_domain
+
+            logger.info(f"Building runtime for app '{app_name}'...")
+            runtime = self._build_runtime(app_name)
+            self._runtimes_by_app[app_name] = runtime


🛠️ Refactor suggestion | 🟠 Major

Serialize per-app runtime initialization.

Two concurrent requests for a cold app_name can both miss the cache, both build a runtime, and both call __enter__(). The later assignment wins, so one entered runtime is leaked and the expensive initialization work is duplicated.

Possible fix

+import asyncio ... class ToolGuardRuntime: def __init__(self, tool_provider, policy_storage: PolicyStorage) -> None: ... self._runtimes_by_app: Dict[str, Any] = {} self._runtime_domains_by_app: Dict[str, RuntimeDomain] = {} + self._runtime_init_locks: Dict[str, asyncio.Lock] = {} self._initialized = False ... async def _get_or_create_runtime_for_app(self, app_name: str): - if app_name in self._runtimes_by_app: - return self._runtimes_by_app[app_name] + if app_name in self._runtimes_by_app: + return self._runtimes_by_app[app_name] + + lock = self._runtime_init_locks.setdefault(app_name, asyncio.Lock()) + async with lock: + if app_name in self._runtimes_by_app: + return self._runtimes_by_app[app_name] - # Try to load and build runtime for this app - try: - logger.info(f"Loading runtime domain for app '{app_name}'...") - runtime_domain = self._load_runtime_domain(app_name) - self._runtime_domains_by_app[app_name] = runtime_domain + # Try to load and build runtime for this app + try: + logger.info(f"Loading runtime domain for app '{app_name}'...") + runtime_domain = self._load_runtime_domain(app_name) + self._runtime_domains_by_app[app_name] = runtime_domain - logger.info(f"Building runtime for app '{app_name}'...") - runtime = self._build_runtime(app_name) - self._runtimes_by_app[app_name] = runtime + logger.info(f"Building runtime for app '{app_name}'...") + runtime = self._build_runtime(app_name) + self._runtimes_by_app[app_name] = runtime - logger.info(f"✅ Runtime initialized for app '{app_name}'") - return runtime - except Exception as e: - ... + logger.info(f"✅ Runtime initialized for app '{app_name}'") + return runtime + except Exception as e: + ...

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/cuga/backend/cuga_graph/policy/tool_guard/tool_guard_runtime.py` around lines 616 - 628, Concurrent cold-starts can race because the snippet checks self._runtimes_by_app, then builds and __enter__()s runtimes without synchronization, causing duplicate expensive builds and leaked entered runtimes; serialize initialization per app by introducing a per-app lock map (e.g., self._runtime_init_locks_by_app) and acquire the lock before re-checking the cache and calling self._load_runtime_domain and self._build_runtime (and __enter__()); ensure you create the lock lazily for the app, use a context manager to release reliably, and only assign into self._runtimes_by_app and self._runtime_domains_by_app after the build succeeds so only one built/entered runtime is kept.

coderabbitai

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/cuga/backend/cuga_graph/policy/tool_guard/manager.py`:
- Around line 224-228: The current except block in the async example-generation
code (the block that logs "❌ Failed to generate examples for tool
'{target_tool}'" and raises RuntimeError) incorrectly catches
asyncio.CancelledError; change the handler in the async generation methods (the
functions containing that logger/error raise — e.g., the tool example generation
functions around the shown block and the similar block at lines 343-347) so that
CancelledError is allowed to propagate: import asyncio and either add an
explicit except asyncio.CancelledError: raise or check isinstance(e,
asyncio.CancelledError) and re-raise before logging/wrapping, then log and wrap
only other exceptions.
- Around line 234-235: The parameter app_name (declared as app_name: str =
"cuga_app") must be constrained before any filesystem-backed generation: add a
small validator helper (e.g., validate_app_name or sanitize_app_name) and call
it at the start of the methods that accept app_name (the function with signature
ending in "-> str" shown and the other method around lines 297-303) to either
reject or normalize unsafe values; specifically disallow path separators ('/',
'\\'), traversal segments ('..'), and any characters outside a safe whitelist
such as /^[A-Za-z0-9_-]+$/, and raise a ValueError (or return the sanitized
identifier) if validation fails so no unsafe path or traversal can be introduced
into persisted artifacts.
- Around line 42-56: Add upfront validation for agent.tool_provider and
agent.cuga_folder in the ToolGuardManager initialization: check that
agent.tool_provider is not None and agent.cuga_folder is not None/empty and
raise a clear ValueError if missing, before wrapping agent._model; then use the
existing Path logic (work_dir_path = Path(agent.cuga_folder) / "toolguard") to
create the directory (mkdir(parents=True, exist_ok=True)) and assign
self.work_dir, ensuring the error messages reference tool_provider and
cuga_folder so failures surface early and clearly.

In `@src/cuga/backend/cuga_graph/policy/tool_guard/tool_invoker.py`:
- Around line 108-112: The except block in tool_invoker.py that logs and wraps
all Exceptions (catching Exception as e) unintentionally swallows
asyncio.CancelledError; update the handler in the invoke path (the block
referencing logger and toolname) to detect and re-raise asyncio.CancelledError
immediately (e.g., if isinstance(e, asyncio.CancelledError): raise) before
logging/wrapping, and ensure asyncio is imported so cancellation is preserved
while other exceptions are still logged and wrapped into the RuntimeError.
- Around line 49-50: When populating self._tools_cache from tools_list returned
by self.tool_provider.get_all_tools(), the dict comprehension {tool.name: tool
for tool in tools_list} silently overwrites duplicate tool names; change this to
fail fast by scanning tools_list for duplicate tool.name values before assigning
self._tools_cache (e.g., build a map while checking if a name already exists and
raise a clear exception or ValueError mentioning the duplicate), so that any
collisions across providers are detected and prevented from routing guards to
the wrong tool.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 43b2aec2-6053-45ae-b837-32394f152cad

📥 Commits

Reviewing files that changed from the base of the PR and between aaa7b21 and 27ed451.

📒 Files selected for processing (3)

pyproject.toml
src/cuga/backend/cuga_graph/policy/tool_guard/manager.py
src/cuga/backend/cuga_graph/policy/tool_guard/tool_invoker.py

✅ Files skipped from review due to trivial changes (1)

pyproject.toml

offerakrabi · 2026-04-30T11:10:05Z

+        self.work_dir = str(work_dir_path)
+        logger.debug(f"ToolGuard working directory: {self.work_dir}")
+
+    async def initialize(


[MED-2] Potential race condition in ToolGuardManager initialization
src/cuga/backend/cuga_graph/policy/tool_guard/manager.py:85

Explanation

The initialize method doesn't check self._initialized before initializing, and has no synchronization mechanism:

async def initialize(self) -> None: logger.info("Initializing ToolGuardManager...") # Get all LangChain tools from the tool provider self.langchain_tools = await self.tool_provider.get_all_tools() # Convert LangChain tools to OpenAPI dict self.tools_dict = langchain_tools_to_openapi(self.langchain_tools) self._initialized = True

If initialize() is called multiple times concurrently, all calls will execute the full initialization logic, potentially causing:

Duplicate/wasted work fetching tools multiple times

Race conditions where self.langchain_tools and self.tools_dict are overwritten mid-initialization

Inconsistent state if one initialization fails while another succeeds

Suggested Fix

Use an async lock to ensure thread-safe initialization:

import asyncio class ToolGuardManager: def __init__(self, agent): self._agent = agent self._initialized = False self._init_lock = asyncio.Lock() async def initialize(self): async with self._init_lock: if self._initialized: return # ... initialization logic self._initialized = True

Agent Prompt

In src/cuga/backend/cuga_graph/policy/tool_guard/manager.py around line 85, the initialize method has a potential race condition where multiple concurrent calls could bypass the _initialized check. Review the implementation and add an asyncio.Lock to ensure thread-safe initialization.

offerakrabi · 2026-04-30T11:21:01Z

+        except Exception as e:
+            logger.error(f"Failed to connect policy storage: {e}")
+            # Fail closed: if policy enforcement is enabled but storage fails,
+            # don't boot the registry unguarded
+            raise RuntimeError(
+                f"Policy system is enabled but PolicyStorage.connect() failed: {e}"


[MED-5] Inconsistent error handling for policy storage connection failures
src/cuga/backend/tools_env/registry/registry/api_registry_server.py:155

Explanation

The code has inconsistent error handling for policy storage connection failures. In _get_or_create_registry, it raises a RuntimeError (fail-closed):

except Exception as e: logger.error(f"Failed to connect policy storage: {e}") raise RuntimeError( f"Policy system is enabled but PolicyStorage.connect() failed: {e}" ) from e

But in the lifespan function, it only logs a warning and continues (fail-open):

except Exception as e: logger.warning(f"Failed to connect policy storage: {e}")

This inconsistency means that in some code paths, policy enforcement can be silently disabled, which is a security risk.

Suggested Fix

Make error handling consistent - fail closed when policy system is enabled:

# In lifespan function if settings.policy.enabled: try: from cuga.backend.cuga_graph.policy.storage import PolicyStorage policy_storage = PolicyStorage() await policy_storage.connect() logger.info("✅ Connected policy storage for ToolGuardRuntime") except Exception as e: logger.error(f"Failed to connect policy storage: {e}") # Fail closed: if policy enforcement is enabled but storage fails, # don't boot the registry unguarded raise RuntimeError( f"Policy system is enabled but PolicyStorage.connect() failed: {e}" ) from e

Agent Prompt

In src/cuga/backend/tools_env/registry/registry/api_registry_server.py around line 155, the error handling for policy storage connection failures is inconsistent between _get_or_create_registry (fail-closed) and lifespan (fail-open). Review both code paths and ensure consistent fail-closed behavior when the policy system is enabled.

offerakrabi · 2026-04-30T11:23:59Z

    """

-    def __init__(self, client: MCPManager):
+    def __init__(self, client: MCPManager, policy_storage=None):


[MED-6] Policy storage lifecycle management should be in ToolGuardRuntime, not registry
src/cuga/backend/tools_env/registry/registry/api_registry_server.py:140

Explanation

The PR adds ~74 lines of policy storage initialization and cleanup code to api_registry_server.py, scattered across multiple functions (_get_or_create_registry, lifespan, reload_config, clear_agent_cache). This makes the registry harder to read and violates single responsibility principle.

The registry's job is to manage and execute tools. Policy enforcement is a separate concern that should be encapsulated in the ToolGuardRuntime class, which already exists and is the component that actually uses the policy storage.

Current approach:

# In api_registry_server.py - registry manages policy storage lifecycle policy_storage = PolicyStorage() await policy_storage.connect() registry = ApiRegistry(client=manager, policy_storage=policy_storage) # Later, in cleanup if reg.policy_storage is not None: await reg.policy_storage.disconnect()

This pattern is repeated in 4+ different places in the file.

Suggested Fix

Move policy storage lifecycle into ToolGuardRuntime:

# In tool_guard_runtime.py class ToolGuardRuntime: def __init__(self, tool_provider, enable_policies: bool = False): self.tool_provider = tool_provider self.policy_storage = None self.enable_policies = enable_policies async def initialize(self): if self.enable_policies: from cuga.backend.cuga_graph.policy.storage import PolicyStorage self.policy_storage = PolicyStorage() await self.policy_storage.connect() # ... load policies from storage async def cleanup(self): if self.policy_storage: await self.policy_storage.disconnect() self.policy_storage = None

Then in registry:

# In api_registry.py - much simpler class ApiRegistry: def __init__(self, client: MCPManager, enable_policies: bool = False): self.mcp_client = client self.guard_runtime = ToolGuardRuntime(self.tool_provider, enable_policies) async def start_servers(self): await self.mcp_client.start_servers() await self.guard_runtime.initialize() async def cleanup(self): await self.guard_runtime.cleanup()

This removes ~50+ lines from api_registry_server.py and keeps all policy-related code in the policy module.

Agent Prompt

In src/cuga/backend/tools_env/registry/registry/api_registry_server.py around line 140, policy storage initialization and cleanup is scattered across multiple functions. Review the architecture and refactor to move policy storage lifecycle management into ToolGuardRuntime class, which is the component that actually uses it. This will simplify the registry code and improve separation of concerns.

… toolguard

naamaz added 13 commits April 13, 2026 14:55

add fields

1a76063

cuga tools 2 oas tool info

855e1d1

create examples

8979bd6

examples in sdk

b478b3e

buildtime

c432d21

tg runtime

84bed81

runtime 1

f1c6b89

runtime

707a1a1

test runtime

38c5209

cleanup

9f01184

cleanup

704f3ca

fixes

0c4106d

Merge remote-tracking branch 'origin/main' into toolguard

49c3d9e

coderabbitai Bot reviewed Apr 27, 2026

View reviewed changes

fixes

07778fa

coderabbitai Bot reviewed Apr 28, 2026

View reviewed changes

fixes

abebaae

coderabbitai Bot reviewed Apr 28, 2026

View reviewed changes

fixes

aaa7b21

coderabbitai Bot reviewed Apr 28, 2026

View reviewed changes

naamaz force-pushed the toolguard branch from aaa7b21 to 49c3d9e Compare April 30, 2026 07:43

cuga-toolguard dev

27ed451

coderabbitai Bot reviewed Apr 30, 2026

View reviewed changes

fix1

e8df696

naamaz force-pushed the toolguard branch from 27ed451 to e8df696 Compare April 30, 2026 08:13

naamaz added 2 commits April 30, 2026 11:13

Merge branch 'main' into toolguard

029b4b9

fix2

0d93fa5

offerakrabi reviewed Apr 30, 2026

View reviewed changes

Merge branch 'toolguard' of https://github.com/naamaz/cuga-agent into…

21cee58

… toolguard

naamaz added 4 commits May 4, 2026 11:16

fix

c94ab7c

fix

4e84c60

fixes

330ed8e

demo sdk example

af1b322

sami-marreed assigned offerakrabi May 10, 2026

naamaz added 3 commits May 14, 2026 09:59

Merge branch 'main' into toolguard

0b58a19

remove tg description

7770edd

fixes

d053e19

Conversation

naamaz commented Apr 27, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sami-marreed Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 28, 2026

Choose a reason for hiding this comment

Task: Fix arg key handling, "args" collision, and fail-open guard behavior

Fix 1 — src/cuga/backend/cuga_graph/policy/tool_guard/tool_guard_runtime.py

Fix 2 — src/cuga/backend/tools_env/registry/registry/api_registry.py

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

offerakrabi Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

naamaz commented Apr 27, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 27, 2026 •

edited

Loading

coderabbitai Bot Apr 27, 2026 •

edited

Loading

Fix 1 — `src/cuga/backend/cuga_graph/policy/tool_guard/tool_guard_runtime.py`

Fix 2 — `src/cuga/backend/tools_env/registry/registry/api_registry.py`