Toolguard#176
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis PR implements Tool Guard integration, adding policy-backed runtime validation for tool calls. It introduces a Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant ApiRegistry
participant ToolGuardRuntime
participant PolicyStorage
participant ToolGuard Runtime Domain
participant MCP Tool
Client->>ApiRegistry: call_function(app_name, function_name, arguments)
ApiRegistry->>ToolGuardRuntime: guard_tool_call(app_name, function_name, arguments)
ToolGuardRuntime->>PolicyStorage: get_policies() [lazy load if needed]
PolicyStorage-->>ToolGuardRuntime: [ToolGuide policies with tool_guards]
ToolGuardRuntime->>ToolGuard Runtime Domain: load RuntimeDomain for app_name
ToolGuard Runtime Domain-->>ToolGuardRuntime: domain types & implementations
ToolGuardRuntime->>ToolGuard Runtime Domain: execute guard_* functions with arguments
alt Policy Violation Detected
ToolGuard Runtime Domain-->>ToolGuardRuntime: PolicyViolationException
ToolGuardRuntime-->>ApiRegistry: violation_message (error)
ApiRegistry-->>Client: 403 ToolGuardViolation response
else Policy Passes
ToolGuardRuntime-->>ApiRegistry: None (OK)
ApiRegistry->>MCP Tool: invoke tool
MCP Tool-->>ApiRegistry: result
ApiRegistry-->>Client: tool result
end
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/cuga/backend/server/main.py (1)
2364-2368:⚠️ Potential issue | 🟠 MajorAuthorization asymmetry: GET exposes
policy_codeto any authenticated user, but POST is restricted to manage roles.You've correctly tightened
POST /api/config/policiestorequire_manage_accessbecause policy code is executed unsandboxed at runtime. However, thisGEThandler still usesrequire_auth(see line 2299) and now returnstool_guards, whoseToolGuard.policy_codefield contains the same executable Python. Any authenticated chat-only user can therefore read the policy code, which:
- leaks security-sensitive validation logic (an attacker who learns the guard logic can craft inputs to bypass it), and
- is inconsistent with the manage-only write boundary you just established.
Consider either gating the GET endpoint on
require_manage_access, or stripping/redacting thepolicy_codefield from eachToolGuardfor non-manage users (similar pattern to how secrets routes mask values).🛡️ Sketch (redact policy_code for non-manage users)
- elif policy_dict["type"] == "tool_guide": + elif policy_dict["type"] == "tool_guide": frontend_policy["target_tools"] = policy_dict.get("target_tools", []) frontend_policy["target_apps"] = policy_dict.get("target_apps") frontend_policy["guide_content"] = policy_dict.get("guide_content", "") - frontend_policy["tool_guards"] = policy_dict.get("tool_guards") + tool_guards = policy_dict.get("tool_guards") + if tool_guards and not _user_has_manage_role(current_user): + tool_guards = { + name: {**g, "policy_code": "<redacted>"} if isinstance(g, dict) else g + for name, g in tool_guards.items() + } + frontend_policy["tool_guards"] = tool_guards frontend_policy["prepend"] = policy_dict.get("prepend", False)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cuga/backend/server/main.py` around lines 2364 - 2368, The GET /api/config/policies handler currently uses require_auth and returns ToolGuard.policy_code in frontend_policy["tool_guards"], leaking executable policy code; change the handler to either call require_manage_access instead of require_auth or, if you want read access for non-manage users, iterate over each ToolGuard in the response and redact/mask the policy_code (e.g., replace with None or "<redacted>") when the current user does not have manage permission; reference the GET handler that builds frontend_policy["tool_guards"], the require_auth / require_manage_access decorators, and the ToolGuard.policy_code field when implementing the check/masking (mirror how secrets routes hide values).
🧹 Nitpick comments (11)
src/cuga/backend/cuga_graph/policy/tool_guard/tool_invoker.py (2)
41-52: Optional: protect cache initialization against concurrent invocations.
_get_toolsis called frominvoke, which can be triggered concurrently for different tools. Two concurrent first-time callers will both calltool_provider.get_all_tools()and overwrite_tools_cache. The end state is correct, but you'll do duplicate work. Ifget_all_tools()is expensive, guard it with anasyncio.Lock.♻️ Sketch
- def __init__(self, tool_provider): + def __init__(self, tool_provider): ... self._tools_cache: Optional[Dict[str, Any]] = None + self._cache_lock = asyncio.Lock() logger.debug("Initialized ToolGuardInvoker with CUGA tool provider") async def _get_tools(self) -> Dict[str, Any]: - if self._tools_cache is None: - tools_list = await self.tool_provider.get_all_tools() - self._tools_cache = {tool.name: tool for tool in tools_list} - logger.debug(f"Cached {len(self._tools_cache)} tools") - return self._tools_cache + if self._tools_cache is None: + async with self._cache_lock: + if self._tools_cache is None: + tools_list = await self.tool_provider.get_all_tools() + self._tools_cache = {tool.name: tool for tool in tools_list} + logger.debug(f"Cached {len(self._tools_cache)} tools") + return self._tools_cache🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cuga/backend/cuga_graph/policy/tool_guard/tool_invoker.py` around lines 41 - 52, The _get_tools method can race on first initialization; add an asyncio.Lock (e.g., self._tools_lock initialized in the class __init__) and use it to guard the cache population: check self._tools_cache first, if None then async with self._tools_lock and re-check self._tools_cache inside the lock, then call await self.tool_provider.get_all_tools() and set self._tools_cache; this prevents duplicate expensive get_all_tools() calls while preserving the existing behavior of _get_tools and invoke.
108-112: Use conversion flag and consider preserving the original exception type.Two minor points:
- Ruff
RUF010: prefer!sconversion flag over explicitstr(e)in f-strings.- Wrapping every non-
ValueErrorasRuntimeErrormasks toolguard-specific exception types (e.g., timeouts, validation errors) that the runtime caller may want to distinguish. Sincefrom eis used, the chain is preserved, but downstreamexceptclauses can no longer match by type.♻️ Proposed diff
except Exception as e: - logger.error(f"Failed to invoke tool '{toolname}': {e}") - raise RuntimeError( - f"Tool invocation failed for '{toolname}': {str(e)}" - ) from e + logger.error(f"Failed to invoke tool '{toolname}': {e!s}") + raise RuntimeError( + f"Tool invocation failed for '{toolname}': {e!s}" + ) from e🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cuga/backend/cuga_graph/policy/tool_guard/tool_invoker.py` around lines 108 - 112, In the except block in tool_invoker.py (the "except Exception as e" handling around the tool invocation that logs with logger and raises a RuntimeError for toolname), change the f-string to use the !s conversion flag (f"Failed to invoke tool '{toolname}': {e!s}") and stop unconditionally wrapping every caught exception as RuntimeError; instead re-raise ValueError or other toolguard-specific exceptions as-is (or re-raise e) so callers can match by type, and only wrap into a RuntimeError when you must add context (preserving the original via "raise RuntimeError(... ) from e" if used).src/cuga/backend/cuga_graph/policy/folder_loader.py (1)
215-220: WrapToolGuardconstruction with a per-tool error to aid diagnosis.If a user-authored markdown file has a
tool_guardsentry that is missing the requireddescriptionfield or contains a typo'd field name,ToolGuard(**guard_config)raisespydantic.ValidationError. That exception bubbles up to the file-leveltry/exceptinload_policies_from_folder, which logs"Failed to load <file_path>: ..."but doesn't tell the user whichtool_nameinsidetool_guardswas malformed — and a single bad guard fails the entire ToolGuide.Also: if
raw_tool_guardsis present but not a dict (e.g., a list, due to a YAML formatting mistake), this silently falls through totool_guards = Nonewith no warning.♻️ Proposed diff
- raw_tool_guards = frontmatter.get('tool_guards') - tool_guards = ( - {tool_name: ToolGuard(**guard_config) for tool_name, guard_config in raw_tool_guards.items()} - if isinstance(raw_tool_guards, dict) - else None - ) + raw_tool_guards = frontmatter.get('tool_guards') + tool_guards = None + if isinstance(raw_tool_guards, dict): + tool_guards = {} + for tool_name, guard_config in raw_tool_guards.items(): + if not isinstance(guard_config, dict): + raise ValueError( + f"ToolGuide {name}: tool_guards['{tool_name}'] must be a mapping, " + f"got {type(guard_config).__name__}" + ) + try: + tool_guards[tool_name] = ToolGuard(**guard_config) + except Exception as e: + raise ValueError( + f"ToolGuide {name}: invalid tool_guards entry for '{tool_name}': {e}" + ) from e + elif raw_tool_guards is not None: + logger.warning( + f"ToolGuide {name}: ignoring 'tool_guards' frontmatter — expected dict, " + f"got {type(raw_tool_guards).__name__}" + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cuga/backend/cuga_graph/policy/folder_loader.py` around lines 215 - 220, The tool guard parser in folder_loader.py currently calls ToolGuard(**guard_config) for all entries and will raise a pydantic.ValidationError that aborts the whole file without identifying which tool failed; update the block that builds tool_guards (the code using raw_tool_guards and ToolGuard) to (1) if raw_tool_guards is present but not a dict, log a warning mentioning the file and raw_tool_guards type and skip it, and (2) iterate raw_tool_guards.items() and construct each ToolGuard inside a try/except that catches ValidationError (or Exception), logs an error including the offending tool_name and the validation error details, and continues building the rest so one bad tool entry does not break the entire ToolGuard set used by load_policies_from_folder.src/cuga/backend/cuga_graph/policy/filesystem_sync.py (2)
92-94: Minor:str(None)fallback can silently write"None"as the policytype.When both
policy.typeandpolicy.policy_typeare missing, this falls back tostr(policy_type)which yields the literal string"None". That value will be written into the YAML frontmatter and won't round-trip correctly throughfolder_loader.POLICY_CREATORS(which only knows the canonical type strings). All concrete policy subclasses do settype, so this is unlikely in practice — but consider raising or logging a warning instead of producing a meaningless"None"string.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cuga/backend/cuga_graph/policy/filesystem_sync.py` around lines 92 - 94, The code currently falls back to str(policy_type) which yields "None" when both policy.type and policy.policy_type are missing; update the logic around policy_type and policy_type_value so that if policy_type is None you fail loudly (e.g., raise a ValueError with a clear message including the policy identity or log a warning and skip writing) instead of writing the string "None"; specifically change the block that computes policy_type and policy_type_value (the variables policy_type and policy_type_value in filesystem_sync.py) to detect None and raise/log, otherwise continue to extract .value or str(...) as before so invalid/missing types are not written into YAML frontmatter.
137-141: Production deployment note:.cuga/tool_guides/containspolicy_codein plaintext.
guard.model_dump()serializes the fullToolGuardincluding the unsandboxed Pythonpolicy_code. In production, restrict read access to the agent process owner so that chat-only users withoutrequire_manage_accesscannot inspect validation logic via filesystem access.Note:
.cugais already in.gitignore, so accidental commits are prevented.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cuga/backend/cuga_graph/policy/filesystem_sync.py` around lines 137 - 141, The current serialization of ToolGuard via guard.model_dump() writes sensitive unsandboxed Python in the policy_code field to disk (frontmatter['tool_guards']), so change the serialization to exclude or redact policy_code before writing .cuga/tool_guides; e.g., replace guard.model_dump() with a safe serialization that either calls guard.model_dump(exclude={'policy_code'}) or uses a helper like ToolGuard.to_safe_dict()/guard.safe_dict() that omits or redacts the policy_code and any other sensitive fields, ensuring only non-executable metadata is written to frontmatter['tool_guards'] while preserving existing keys (tool_guards, prepend).src/cuga/backend/tools_env/registry/registry/api_registry.py (1)
232-232: Simplify redundant unwrap.Inside the conditional,
'params' in argumentsis already true, so the.get('params', arguments)default is dead.♻️ Cleanup
- unwrapped_args = arguments.get('params', arguments) if isinstance(arguments, dict) and 'params' in arguments else arguments + unwrapped_args = ( + arguments['params'] + if isinstance(arguments, dict) and 'params' in arguments + else arguments + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cuga/backend/tools_env/registry/registry/api_registry.py` at line 232, The conditional that computes unwrapped_args uses arguments.get('params', arguments) redundantly because the branch already ensures 'params' is in arguments; update the expression to directly access arguments['params'] when isinstance(arguments, dict) and 'params' in arguments, e.g. replace the current ternary with arguments['params'] in that branch so unwrapped_args is set deterministically; locate the assignment to unwrapped_args in api_registry.py to make this change.src/cuga/backend/tools_env/registry/registry/api_registry_server.py (1)
143-160: Extract the policy-storage bootstrap into a single helper.The
if settings.policy.enabled: ... PolicyStorage().connect() ...block is duplicated 3 times across_get_or_create_registry,lifespan, andreload_config(and would need to be touched in lockstep if you adopt the fail-closed fix above). A smallasync def _maybe_connect_policy_storage(fail_closed: bool) -> Optional[PolicyStorage]helper would centralize the lifecycle and policy.Also applies to: 186-198, 564-576
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cuga/backend/tools_env/registry/registry/api_registry_server.py` around lines 143 - 160, The policy storage bootstrap sequence (checking settings.policy.enabled, instantiating PolicyStorage, calling connect(), logging success/failure and optionally failing closed) is duplicated across _get_or_create_registry, lifespan, and reload_config; refactor by extracting that logic into a single async helper named _maybe_connect_policy_storage(fail_closed: bool) -> Optional[PolicyStorage] that encapsulates the settings.policy.enabled check, instantiates PolicyStorage, calls await policy_storage.connect(), logs success with logger.info, logs failures with logger.error, and either returns None or raises RuntimeError when fail_closed is True; update callers (in _get_or_create_registry, lifespan, reload_config and where ApiRegistry(client=manager, policy_storage=...) is created) to call this helper and pass its return value into ApiRegistry.src/cuga/backend/cuga_graph/policy/utils.py (1)
471-473: Subtle semantic change: missingtriggersno longer errors.Previously, missing
triggerswould have raisedAttributeError(acting as a hard failure for non-ToolApproval policies). The new conditionif policy_triggers is not None and not policy_triggersonly flags the explicit empty-list case. This is the intended fix forToolApproval(which has notriggersfield), but it now silently skips the "at least one trigger" check whenevergetattrreturnsNonefor any policy type. SincePlaybook/IntentGuard/ToolGuide/OutputFormatter/CustomPolicyall declaretriggers: List[Trigger] = Field(...)(required at construction), this should be fine in practice — but consider explicitly skipping the trigger validation only forToolApprovalto make the intent clearer.♻️ More explicit alternative
- policy_triggers = getattr(policy, "triggers", None) - if policy_triggers is not None and not policy_triggers: - errors.append("At least one trigger is required") + if not isinstance(policy, ToolApproval): + policy_triggers = getattr(policy, "triggers", None) + if not policy_triggers: + errors.append("At least one trigger is required")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cuga/backend/cuga_graph/policy/utils.py` around lines 471 - 473, The current check uses policy_triggers = getattr(policy, "triggers", None) and only errors on an explicit empty list, which unintentionally skips validation for policies that should require triggers; modify the condition to explicitly skip the validation only for ToolApproval (or alternately require the attribute exist) — e.g., use isinstance(policy, ToolApproval) to bypass trigger checks for ToolApproval, otherwise assert that getattr(policy, "triggers", None) is not None and not empty and append "At least one trigger is required" when missing or empty; update the check around policy_triggers, getattr(policy, "triggers", None), and references to ToolApproval/Playbook/IntentGuard/ToolGuide/OutputFormatter/CustomPolicy so intent is explicit.src/cuga/sdk_core/debug_tool_guard_e2e.py (1)
144-205: Drop redundantfprefixes (Ruff F541).Several print statements use
f"..."without any placeholders. Minor cleanup to silence the lint warnings.♻️ Quick fix
- print(f"Creating policy...") + print("Creating policy...") ... - print(f"Generating examples...") + print("Generating examples...") ... - print(f"Generating guard code...") + print("Generating guard code...") ... - print(f"✅ Policy saved successfully") + print("✅ Policy saved successfully")As per the static analysis hints (Ruff F541 at lines 144, 154, 175, 205).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cuga/sdk_core/debug_tool_guard_e2e.py` around lines 144 - 205, Remove the unnecessary f-string prefixes on the static print calls to fix Ruff F541: replace print(f"Creating policy..."), print(f"Generating examples..."), print(f"Generating guard code..."), and print(f"✅ Policy saved successfully") with regular string literals (print("...")) and leave the other f-strings that interpolate variables (e.g., print(f"✅ Created policy with ID: {policy_id}"), print(f"✅ Generated {len(violating_examples)} ..."), print(f"✅ Generated guard code ({len(guard_code)} characters)")) unchanged.src/cuga/sdk.py (1)
539-575: Return type annotation inconsistent withNonereturn.The signature declares
-> str, but Line 575 returnsNonewhen the policy system is disabled. This pattern matches several otheradd_*methods in this file, but consider tightening toOptional[str]for accuracy. New callers using strict typing (e.g., mypy/pyright) will see false-positives.♻️ Proposed annotation
- ) -> str: + ) -> Optional[str]:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cuga/sdk.py` around lines 539 - 575, The function update_tool_guard currently declares a return type of str but returns None when the policy system is disabled; change the signature to return Optional[str] and update any related docstring Returns section accordingly; ensure other early-exit paths in update_tool_guard (and similar methods like the add_* helpers in this module) consistently return Optional[str] so type checkers (mypy/pyright) won't raise errors.src/cuga/backend/cuga_graph/policy/tool_guard/manager.py (1)
18-21: Module-level import ofCugaAgentcreates a fragile circular dependency.
cuga.sdkre-importsToolGuardManagerfrom this module (lines 1280, 1375 insrc/cuga/sdk.py), and this module importsCugaAgentfromcuga.sdkat module-load time. It works today only becausesdk.pydefers theToolGuardManagerimport inside method bodies. If anyone importscuga.backend.cuga_graph.policy.tool_guard.managerbeforecuga.sdkfinishes loading, the import ofCugaAgentwill fail.Consider deferring the
CugaAgentimport usingTYPE_CHECKINGfor the import and a string-form forward reference for the type annotation to break the cycle.♻️ Proposed refactor
Update the typing import on line 11 to include
TYPE_CHECKING:-from typing import List, Tuple, Dict, Any, Iterator, Optional +from typing import List, Tuple, Dict, Any, Iterator, Optional, TYPE_CHECKINGThen update lines 18-21:
-from cuga.sdk import CugaAgent from toolguard.buildtime.buildtime import ToolGuardSpec, generate_guard_examples, generate_guards_code from toolguard.extra.langchain_to_oas import langchain_tools_to_openapi from toolguard.runtime.data_types import ToolGuardSpecItem, ToolGuardsCodeGenerationResult + +if TYPE_CHECKING: + from cuga.sdk import CugaAgentAnd update the
__init__parameter annotation on line 28:def __init__( self, - agent: CugaAgent, + agent: "CugaAgent", ):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cuga/backend/cuga_graph/policy/tool_guard/manager.py` around lines 18 - 21, The module-level import of CugaAgent introduces a circular dependency; change the imports to add TYPE_CHECKING from typing, move the CugaAgent import into an if TYPE_CHECKING: block (i.e., if TYPE_CHECKING: from cuga.sdk import CugaAgent) and remove the top-level from cuga.sdk import CugaAgent, then update the ToolGuardManager __init__ (or whichever constructor uses CugaAgent) parameter annotation to use a string forward reference ('CugaAgent') instead of the direct type so runtime import is deferred.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/cuga/backend/cuga_graph/policy/tool_guard/cuga_llm_adapter.py`:
- Around line 178-187: The current response-content handling in cuga_llm_adapter
(the block that checks hasattr(response, 'content') and then joins list items)
is lossy for multimodal/list-of-dicts payloads; update that logic to, when
content is a list, iterate items and for each item if it's a dict extract
text-like fields (e.g., 'text', 'content', or nested {'type':'text','text':...})
prefer the 'text' value, otherwise skip non-text types (images, urls) or convert
primitives to str, then join the extracted text parts with newlines; keep the
existing fallback that casts non-list content to str.
In `@src/cuga/backend/cuga_graph/policy/tool_guard/tool_guard_runtime.py`:
- Line 217: The code uses Path.cwd() to compute domain_dir which couples
ToolGuardManager to the process CWD; change ToolGuardManager to accept an
explicit domain_dir parameter in its __init__ (or read from
settings.policy.toolguard_domain_dir if None) and replace the inline Path.cwd()
/ ".cuga" / "toolguard" / "domain" with that deterministic path; update the
construction sites of ToolGuardManager to pass the configured domain_dir (or
ensure settings.policy.toolguard_domain_dir is set) so both writer and reader
resolve the same domain files and avoid silent RuntimeError.
- Around line 570-576: The current dynamic class creation type("ToolGuardArgs",
(), arguments)() loses any argument keys that are not valid Python identifiers
and also causes collisions when a tool parameter is literally named "args"; to
fix: replace the type()-based approach in tool_guard_runtime.py (where args_obj
is created and passed into self._runtime.guard_toolcall) with a robust wrapper
that (1) preserves the original arguments dict unmodified (e.g., keep a .raw or
.all_args dict on the wrapper) and exposes only safe identifier attributes for
keys that pass str.isidentifier(), and (2) stop injecting into the call under
the literal "args" key—use a non-colliding reserved key such as "__guard_args__"
or "_guard_args" when merging with arguments; also update the guard invocation
site (self._runtime.guard_toolcall and any code consuming args_obj) to read from
the wrapper’s .raw for arbitrary keys and from attributes for identifier-safe
keys, and adjust api_registry.py error handling (the try/except around guard
validation) to not silently swallow guard exceptions—either re-raise or treat
guard failures as closed (deny) so guards don’t fail open.
In `@src/cuga/backend/tools_env/registry/registry/api_registry_server.py`:
- Around line 186-198: The YAML lifespan and reload_config paths currently catch
exceptions when creating PolicyStorage and only log a warning, allowing
policy_storage to be None and disabling ToolGuardRuntime; match the fail-closed
behavior of _get_or_create_registry by changing both spots where PolicyStorage()
is constructed and connect() is awaited (the lifespan block that sets
policy_storage and the reload_config routine) so that on any Exception you log
the failure and then raise a RuntimeError (with the error details) instead of
swallowing it—this ensures ApiRegistry (and its
_initialize_tool_guard_runtime/ToolGuardRuntime behavior) cannot start or reload
with policy.enabled=True and a missing policy storage.
In `@src/cuga/backend/tools_env/registry/registry/api_registry.py`:
- Around line 235-261: The wrapper in api_registry.py around
self.tool_guard_runtime.guard_tool_call currently catches all Exceptions and
lets the tool proceed; change it to honor ToolGuardRuntime.guard_tool_call's
fail-closed contract by treating exceptions from guard_tool_call as policy
violations (i.e., return the same blocking response structure with status
"exception", status_code 403, error_type "ToolGuardViolation" and a message that
includes the exception text and function_name) rather than continuing execution,
or alternatively narrow the except to only specific, well-justified recoverable
exceptions; locate the guard call in api_registry.py (the block invoking
guard_tool_call) and implement the fail-closed behavior consistent with
ToolGuardRuntime.guard_tool_call.
In `@src/cuga/sdk.py`:
- Around line 1410-1411: Remove the unreachable second return that references
undefined variables: inside the function that currently ends with "return
guard_code" followed by "return violating_examples, compliance_examples", delete
the latter unreachable return statement so only "return guard_code" remains;
this removes the dead code and the potential NameError flagged by Ruff F821
while preserving the intended returned value from the function.
- Around line 1358-1364: The docstring example calls policies.update_tool_guard
with the wrong signature; update_tool_guard only accepts (policy_id,
tool_guards: Dict[str, Dict[str, Any]]). Change the example call to pass a
tool_guards mapping instead of separate keyword args — e.g. call
policies.update_tool_guard(policy_id, {"delete_file": {"violating_examples":
violating, "compliance_examples": compliance}}) so the example matches the
update_tool_guard implementation.
---
Outside diff comments:
In `@src/cuga/backend/server/main.py`:
- Around line 2364-2368: The GET /api/config/policies handler currently uses
require_auth and returns ToolGuard.policy_code in
frontend_policy["tool_guards"], leaking executable policy code; change the
handler to either call require_manage_access instead of require_auth or, if you
want read access for non-manage users, iterate over each ToolGuard in the
response and redact/mask the policy_code (e.g., replace with None or
"<redacted>") when the current user does not have manage permission; reference
the GET handler that builds frontend_policy["tool_guards"], the require_auth /
require_manage_access decorators, and the ToolGuard.policy_code field when
implementing the check/masking (mirror how secrets routes hide values).
---
Nitpick comments:
In `@src/cuga/backend/cuga_graph/policy/filesystem_sync.py`:
- Around line 92-94: The code currently falls back to str(policy_type) which
yields "None" when both policy.type and policy.policy_type are missing; update
the logic around policy_type and policy_type_value so that if policy_type is
None you fail loudly (e.g., raise a ValueError with a clear message including
the policy identity or log a warning and skip writing) instead of writing the
string "None"; specifically change the block that computes policy_type and
policy_type_value (the variables policy_type and policy_type_value in
filesystem_sync.py) to detect None and raise/log, otherwise continue to extract
.value or str(...) as before so invalid/missing types are not written into YAML
frontmatter.
- Around line 137-141: The current serialization of ToolGuard via
guard.model_dump() writes sensitive unsandboxed Python in the policy_code field
to disk (frontmatter['tool_guards']), so change the serialization to exclude or
redact policy_code before writing .cuga/tool_guides; e.g., replace
guard.model_dump() with a safe serialization that either calls
guard.model_dump(exclude={'policy_code'}) or uses a helper like
ToolGuard.to_safe_dict()/guard.safe_dict() that omits or redacts the policy_code
and any other sensitive fields, ensuring only non-executable metadata is written
to frontmatter['tool_guards'] while preserving existing keys (tool_guards,
prepend).
In `@src/cuga/backend/cuga_graph/policy/folder_loader.py`:
- Around line 215-220: The tool guard parser in folder_loader.py currently calls
ToolGuard(**guard_config) for all entries and will raise a
pydantic.ValidationError that aborts the whole file without identifying which
tool failed; update the block that builds tool_guards (the code using
raw_tool_guards and ToolGuard) to (1) if raw_tool_guards is present but not a
dict, log a warning mentioning the file and raw_tool_guards type and skip it,
and (2) iterate raw_tool_guards.items() and construct each ToolGuard inside a
try/except that catches ValidationError (or Exception), logs an error including
the offending tool_name and the validation error details, and continues building
the rest so one bad tool entry does not break the entire ToolGuard set used by
load_policies_from_folder.
In `@src/cuga/backend/cuga_graph/policy/tool_guard/manager.py`:
- Around line 18-21: The module-level import of CugaAgent introduces a circular
dependency; change the imports to add TYPE_CHECKING from typing, move the
CugaAgent import into an if TYPE_CHECKING: block (i.e., if TYPE_CHECKING: from
cuga.sdk import CugaAgent) and remove the top-level from cuga.sdk import
CugaAgent, then update the ToolGuardManager __init__ (or whichever constructor
uses CugaAgent) parameter annotation to use a string forward reference
('CugaAgent') instead of the direct type so runtime import is deferred.
In `@src/cuga/backend/cuga_graph/policy/tool_guard/tool_invoker.py`:
- Around line 41-52: The _get_tools method can race on first initialization; add
an asyncio.Lock (e.g., self._tools_lock initialized in the class __init__) and
use it to guard the cache population: check self._tools_cache first, if None
then async with self._tools_lock and re-check self._tools_cache inside the lock,
then call await self.tool_provider.get_all_tools() and set self._tools_cache;
this prevents duplicate expensive get_all_tools() calls while preserving the
existing behavior of _get_tools and invoke.
- Around line 108-112: In the except block in tool_invoker.py (the "except
Exception as e" handling around the tool invocation that logs with logger and
raises a RuntimeError for toolname), change the f-string to use the !s
conversion flag (f"Failed to invoke tool '{toolname}': {e!s}") and stop
unconditionally wrapping every caught exception as RuntimeError; instead
re-raise ValueError or other toolguard-specific exceptions as-is (or re-raise e)
so callers can match by type, and only wrap into a RuntimeError when you must
add context (preserving the original via "raise RuntimeError(... ) from e" if
used).
In `@src/cuga/backend/cuga_graph/policy/utils.py`:
- Around line 471-473: The current check uses policy_triggers = getattr(policy,
"triggers", None) and only errors on an explicit empty list, which
unintentionally skips validation for policies that should require triggers;
modify the condition to explicitly skip the validation only for ToolApproval (or
alternately require the attribute exist) — e.g., use isinstance(policy,
ToolApproval) to bypass trigger checks for ToolApproval, otherwise assert that
getattr(policy, "triggers", None) is not None and not empty and append "At least
one trigger is required" when missing or empty; update the check around
policy_triggers, getattr(policy, "triggers", None), and references to
ToolApproval/Playbook/IntentGuard/ToolGuide/OutputFormatter/CustomPolicy so
intent is explicit.
In `@src/cuga/backend/tools_env/registry/registry/api_registry_server.py`:
- Around line 143-160: The policy storage bootstrap sequence (checking
settings.policy.enabled, instantiating PolicyStorage, calling connect(), logging
success/failure and optionally failing closed) is duplicated across
_get_or_create_registry, lifespan, and reload_config; refactor by extracting
that logic into a single async helper named
_maybe_connect_policy_storage(fail_closed: bool) -> Optional[PolicyStorage] that
encapsulates the settings.policy.enabled check, instantiates PolicyStorage,
calls await policy_storage.connect(), logs success with logger.info, logs
failures with logger.error, and either returns None or raises RuntimeError when
fail_closed is True; update callers (in _get_or_create_registry, lifespan,
reload_config and where ApiRegistry(client=manager, policy_storage=...) is
created) to call this helper and pass its return value into ApiRegistry.
In `@src/cuga/backend/tools_env/registry/registry/api_registry.py`:
- Line 232: The conditional that computes unwrapped_args uses
arguments.get('params', arguments) redundantly because the branch already
ensures 'params' is in arguments; update the expression to directly access
arguments['params'] when isinstance(arguments, dict) and 'params' in arguments,
e.g. replace the current ternary with arguments['params'] in that branch so
unwrapped_args is set deterministically; locate the assignment to unwrapped_args
in api_registry.py to make this change.
In `@src/cuga/sdk_core/debug_tool_guard_e2e.py`:
- Around line 144-205: Remove the unnecessary f-string prefixes on the static
print calls to fix Ruff F541: replace print(f"Creating policy..."),
print(f"Generating examples..."), print(f"Generating guard code..."), and
print(f"✅ Policy saved successfully") with regular string literals
(print("...")) and leave the other f-strings that interpolate variables (e.g.,
print(f"✅ Created policy with ID: {policy_id}"), print(f"✅ Generated
{len(violating_examples)} ..."), print(f"✅ Generated guard code
({len(guard_code)} characters)")) unchanged.
In `@src/cuga/sdk.py`:
- Around line 539-575: The function update_tool_guard currently declares a
return type of str but returns None when the policy system is disabled; change
the signature to return Optional[str] and update any related docstring Returns
section accordingly; ensure other early-exit paths in update_tool_guard (and
similar methods like the add_* helpers in this module) consistently return
Optional[str] so type checkers (mypy/pyright) won't raise errors.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 260888de-4fd5-4c35-9440-ff05c3026138
📒 Files selected for processing (19)
pyproject.tomlsrc/cuga/backend/cuga_graph/policy/__init__.pysrc/cuga/backend/cuga_graph/policy/filesystem_sync.pysrc/cuga/backend/cuga_graph/policy/folder_loader.pysrc/cuga/backend/cuga_graph/policy/models.pysrc/cuga/backend/cuga_graph/policy/tests/test_filesystem_sync.pysrc/cuga/backend/cuga_graph/policy/tool_guard/__init__.pysrc/cuga/backend/cuga_graph/policy/tool_guard/cuga_llm_adapter.pysrc/cuga/backend/cuga_graph/policy/tool_guard/manager.pysrc/cuga/backend/cuga_graph/policy/tool_guard/tool_guard_runtime.pysrc/cuga/backend/cuga_graph/policy/tool_guard/tool_invoker.pysrc/cuga/backend/cuga_graph/policy/utils.pysrc/cuga/backend/server/main.pysrc/cuga/backend/tools_env/registry/registry/api_registry.pysrc/cuga/backend/tools_env/registry/registry/api_registry_server.pysrc/cuga/sdk.pysrc/cuga/sdk_core/debug_tool_guard_e2e.pysrc/cuga/sdk_core/tests/policies-export-2025-12-31.jsonsrc/cuga/sdk_core/tests/test_sdk_policies.py
💤 Files with no reviewable changes (1)
- src/cuga/sdk_core/tests/policies-export-2025-12-31.json
| Raises: | ||
| RuntimeError: If domain directory or files are not found | ||
| """ | ||
| domain_dir = Path.cwd() / ".cuga" / "toolguard" / "domain" |
There was a problem hiding this comment.
Path.cwd() couples runtime to process working directory.
Domain files are resolved relative to whatever directory the registry process happens to be started from. If ToolGuardManager (the writer) and the registry (the reader) run with different CWDs — common in containerized or systemd setups — domain loading silently raises RuntimeError. Consider plumbing this through settings or accepting an explicit domain_dir parameter so it's deterministic.
♻️ Suggested approach
- def _load_runtime_domain(self) -> RuntimeDomain:
+ def _load_runtime_domain(self, domain_dir: Optional[Path] = None) -> RuntimeDomain:
...
- domain_dir = Path.cwd() / ".cuga" / "toolguard" / "domain"
+ domain_dir = domain_dir or (Path.cwd() / ".cuga" / "toolguard" / "domain")Then accept the path through __init__ (or read from settings.policy.toolguard_domain_dir) so callers can override.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/cuga/backend/cuga_graph/policy/tool_guard/tool_guard_runtime.py` at line
217, The code uses Path.cwd() to compute domain_dir which couples
ToolGuardManager to the process CWD; change ToolGuardManager to accept an
explicit domain_dir parameter in its __init__ (or read from
settings.policy.toolguard_domain_dir if None) and replace the inline Path.cwd()
/ ".cuga" / "toolguard" / "domain" with that deterministic path; update the
construction sites of ToolGuardManager to pass the configured domain_dir (or
ensure settings.policy.toolguard_domain_dir is set) so both writer and reader
resolve the same domain files and avoid silent RuntimeError.
| try: | ||
| args_obj = type("ToolGuardArgs", (), arguments)() | ||
| await self._runtime.guard_toolcall( | ||
| tool_name=function_name, | ||
| args=arguments | {"args": args_obj}, | ||
| delegate=self.invoker, | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
toolguard runtime guard_toolcall args parameter expected shape
💡 Result:
The guard_toolcall method in ToolGuard's runtime module (from the AgentToolkit/toolguard Python package) expects three positional arguments: 1. tool_name (str): The name of the tool being called (e.g., "add_tool", "divide_tool"). 2. args (dict): The tool arguments as a dictionary of parameter values (e.g., {"a": 5, "b": 4} or {"g": 10, "h": 2}). 3. invoker: An invoker object compatible with the tools, such as ToolFunctionsInvoker, ToolMethodsInvoker, or LangchainToolInvoker. For LangChain tools, args must be nested under an "args" key: {"args": {"g": 5, "h": 2}}. The method is asynchronous (await toolguard.guard_toolcall(...)) and raises PolicyViolationException if the call violates generated policy guards. It is used after loading guards via load_toolguards(path). Example usage: invoker = ToolFunctionsInvoker([divide_tool, add_tool, multiply_tool]) await toolguard.guard_toolcall("add_tool", {"a": 5, "b": 4}, invoker) For LangChain: invoker = LangchainToolInvoker(tools) await toolguard.guard_toolcall("divide_tool", {"args": {"g": 5, "h": 2}}, invoker)
Citations:
- 1: https://github.com/AgentToolkit/toolguard
- 2: https://pypi.org/project/toolguard/0.1.19/
- 3: https://pypi.org/project/toolguard/
🏁 Script executed:
#!/bin/bash
# Read the actual code at lines 570-576
sed -n '570,576p' src/cuga/backend/cuga_graph/policy/tool_guard/tool_guard_runtime.py
echo "---"
# Also read some context around it (560-580)
sed -n '560,580p' src/cuga/backend/cuga_graph/policy/tool_guard/tool_guard_runtime.pyRepository: cuga-project/cuga-agent
Length of output: 1071
🏁 Script executed:
#!/bin/bash
# Read api_registry.py lines 257-260 to see error handling
sed -n '257,260p' src/cuga/backend/cuga_graph/policy/tool_guard/api_registry.py
echo "---"
# Also read context around it (250-270)
sed -n '250,270p' src/cuga/backend/cuga_graph/policy/tool_guard/api_registry.pyRepository: cuga-project/cuga-agent
Length of output: 299
🏁 Script executed:
#!/bin/bash
# Search for usage of tool arguments to see if non-identifier keys are actually used
rg "user-id|user\.id|user-name" src/ -i --type py -A 2 -B 2 | head -50Repository: cuga-project/cuga-agent
Length of output: 772
🏁 Script executed:
#!/bin/bash
# Check if there are any test cases or examples showing argument names with special characters
fd -e py -x grep -l "guard_toolcall\|ToolGuardArgs" {} \; | head -10Repository: cuga-project/cuga-agent
Length of output: 137
🏁 Script executed:
#!/bin/bash
# Find api_registry.py
fd -name "api_registry.py" src/Repository: cuga-project/cuga-agent
Length of output: 298
🏁 Script executed:
#!/bin/bash
# Search for error handling around guard_toolcall to find the wrapper mentioned
rg "guard_toolcall" src/ -B 3 -A 3 --type py | head -80Repository: cuga-project/cuga-agent
Length of output: 800
🏁 Script executed:
#!/bin/bash
# Verify what happens with type() when dict keys are not valid identifiers
python3 << 'EOF'
# Test 1: Non-identifier keys
test_args = {"user-id": "123", "valid_key": "value"}
try:
args_obj = type("ToolGuardArgs", (), test_args)()
print(f"Test 1 passed: {args_obj.__dict__}")
# Try to access via attribute
try:
print(f"Access user-id: {args_obj.__dict__}")
except:
print("Cannot access user-id as attribute")
except TypeError as e:
print(f"Test 1 FAILED with TypeError: {e}")
print("---")
# Test 2: "args" collision
test_args2 = {"a": 1, "b": 2, "args": "user_provided"}
args_obj2 = type("ToolGuardArgs", (), test_args2)()
merged = test_args2 | {"args": args_obj2}
print(f"Test 2 - merged dict: {merged}")
print(f"Test 2 - 'args' value type: {type(merged['args'])}")
print(f"Original args value overwritten: {merged['args'] != 'user_provided'}")
EOFRepository: cuga-project/cuga-agent
Length of output: 305
🏁 Script executed:
#!/bin/bash
# Find api_registry references or wrapper around guard_toolcall
rg "api_registry|257|260" src/ --type py -B 2 -A 2 | head -100Repository: cuga-project/cuga-agent
Length of output: 7490
🏁 Script executed:
#!/bin/bash
# Search for files that import or use guard_toolcall
rg "from.*tool_guard\|import.*guard" src/ --type py | grep -v ".pyc"Repository: cuga-project/cuga-agent
Length of output: 49
🏁 Script executed:
#!/bin/bash
# Look for exception handling around guard_toolcall that is NOT PolicyViolationException
rg "guard_toolcall" src/ --type py -B 10 | grep -E "except|TypeError|Exception" -A 2 -B 2Repository: cuga-project/cuga-agent
Length of output: 49
🏁 Script executed:
#!/bin/bash
# Find the correct api_registry.py file
fd api_registry.py src/Repository: cuga-project/cuga-agent
Length of output: 195
🏁 Script executed:
#!/bin/bash
# Read lines 257-260 from the correct api_registry.py
sed -n '257,260p' src/cuga/backend/tools_env/registry/registry/api_registry.py
echo "---"
# Context around it
sed -n '250,270p' src/cuga/backend/tools_env/registry/registry/api_registry.pyRepository: cuga-project/cuga-agent
Length of output: 1546
🏁 Script executed:
#!/bin/bash
# Get more context around the guard execution to understand the flow
sed -n '240,290p' src/cuga/backend/tools_env/registry/registry/api_registry.pyRepository: cuga-project/cuga-agent
Length of output: 2451
🏁 Script executed:
#!/bin/bash
# Verify that guard_toolcall is called before this exception handler
rg "guard_toolcall" src/cuga/backend/tools_env/registry/registry/api_registry.py -B 5 -A 1Repository: cuga-project/cuga-agent
Length of output: 49
🏁 Script executed:
#!/bin/bash
# Search for the guard validation method call in api_registry.py
rg "validate_tool_call|guard" src/cuga/backend/tools_env/registry/registry/api_registry.py -B 2 -A 2Repository: cuga-project/cuga-agent
Length of output: 3762
🏁 Script executed:
#!/bin/bash
# Check the complete guard validation flow in tool_guard_runtime.py
wc -l src/cuga/backend/cuga_graph/policy/tool_guard/tool_guard_runtime.pyRepository: cuga-project/cuga-agent
Length of output: 139
🏁 Script executed:
#!/bin/bash
# Look at the validate_tool_call method to understand the full picture
sed -n '550,600p' src/cuga/backend/cuga_graph/policy/tool_guard/tool_guard_runtime.pyRepository: cuga-project/cuga-agent
Length of output: 1915
Dynamic class creation silently drops non-identifier argument keys and collides with "args" parameter.
At line 571, type("ToolGuardArgs", (), arguments)() uses arguments as the class namespace. While type() does not raise an error on non-identifier keys (e.g., "user-id", "user.name"), these keys are silently dropped from the resulting class's __dict__, making them inaccessible as attributes in guard code. Additionally, line 574's arguments | {"args": args_obj} will overwrite any user parameter literally named "args".
Two issues:
- Silent data loss for argument keys that aren't valid Python identifiers—they vanish from
args_objattributes. - Parameter collision for any tool with a parameter named
args, which gets replaced by the injected args object.
Error handling in api_registry.py lines 257–260 catches all exceptions during guard validation and allows the tool call to proceed without guards, so guard errors fail open rather than closed.
🛡️ Proposed fix — validate keys and avoid collision
- args_obj = type("ToolGuardArgs", (), arguments)()
- await self._runtime.guard_toolcall(
- tool_name=function_name,
- args=arguments | {"args": args_obj},
- delegate=self.invoker,
- )
+ # Build a namespace using only valid Python identifier keys so type()
+ # doesn't create a class with unusable attributes.
+ ns = {k: v for k, v in arguments.items() if isinstance(k, str) and k.isidentifier()}
+ args_obj = type("ToolGuardArgs", (), ns)()
+ merged = {**arguments, "args": args_obj}
+ if "args" in arguments:
+ logger.warning(
+ f"Tool '{function_name}' has a parameter named 'args' which "
+ "collides with the ToolGuard injected args object."
+ )
+ await self._runtime.guard_toolcall(
+ tool_name=function_name,
+ args=merged,
+ delegate=self.invoker,
+ )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/cuga/backend/cuga_graph/policy/tool_guard/tool_guard_runtime.py` around
lines 570 - 576, The current dynamic class creation type("ToolGuardArgs", (),
arguments)() loses any argument keys that are not valid Python identifiers and
also causes collisions when a tool parameter is literally named "args"; to fix:
replace the type()-based approach in tool_guard_runtime.py (where args_obj is
created and passed into self._runtime.guard_toolcall) with a robust wrapper that
(1) preserves the original arguments dict unmodified (e.g., keep a .raw or
.all_args dict on the wrapper) and exposes only safe identifier attributes for
keys that pass str.isidentifier(), and (2) stop injecting into the call under
the literal "args" key—use a non-colliding reserved key such as "__guard_args__"
or "_guard_args" when merging with arguments; also update the guard invocation
site (self._runtime.guard_toolcall and any code consuming args_obj) to read from
the wrapper’s .raw for arbitrary keys and from attributes for identifier-safe
keys, and adjust api_registry.py error handling (the try/except around guard
validation) to not silently swallow guard exceptions—either re-raise or treat
guard failures as closed (deny) so guards don’t fail open.
There was a problem hiding this comment.
🧩 Analysis chain
@sami-marreed, here's the plan to address all three issues flagged in the review:
Task: Fix arg key handling, "args" collision, and fail-open guard behavior
- Fix
guard_tool_callintool_guard_runtime.pyto filter non-identifier keys and avoid "args" collision - Fix fail-open error handling in
api_registry.pyso guard exceptions block the call
🏁 Script executed:
#!/bin/bash
# Read the full guard_tool_call method to get current context
sed -n '530,620p' src/cuga/backend/cuga_graph/policy/tool_guard/tool_guard_runtime.py
Length of output: 3151
🏁 Script executed:
#!/bin/bash
# Read the guard validation block with context
sed -n '240,290p' src/cuga/backend/tools_env/registry/registry/api_registry.py
Length of output: 2384
@sami-marreed, here are the two concrete fixes:
Fix 1 — src/cuga/backend/cuga_graph/policy/tool_guard/tool_guard_runtime.py
Filter non-identifier keys before building the attribute object, and use "__guard_args__" to avoid colliding with any tool parameter literally named "args":
try:
# Build attribute object using only valid Python identifier keys so that
# non-identifier argument names (e.g. "user-id") are not silently dropped.
ns = {
k: v
for k, v in arguments.items()
if isinstance(k, str) and k.isidentifier()
}
if len(ns) < len(arguments):
dropped = [k for k in arguments if not (isinstance(k, str) and k.isidentifier())]
logger.warning(
f"Tool '{function_name}': {len(dropped)} argument key(s) are not valid "
f"Python identifiers and will not be accessible as attributes: {dropped}"
)
args_obj = type("ToolGuardArgs", (), ns)()
# Use "__guard_args__" instead of "args" to avoid overwriting a legitimate
# tool parameter that happens to be named "args".
if "args" in arguments:
logger.warning(
f"Tool '{function_name}' has a parameter named 'args'; it is preserved "
"in the arguments dict and the guard args object is injected as '__guard_args__'."
)
merged = {**arguments, "__guard_args__": args_obj}
await self._runtime.guard_toolcall(
tool_name=function_name,
args=merged,
delegate=self.invoker,
)Fix 2 — src/cuga/backend/tools_env/registry/registry/api_registry.py
Change the fail-open except block to fail-closed by returning a 403, consistent with how a real violation is handled:
except Exception as e:
# Fail closed: an unexpected guard error must not silently allow
# the tool call to proceed, as that would let a broken/malformed
# guard bypass policy enforcement entirely.
logger.error(
f"Unexpected error executing tool guard for '{function_name}': {e}",
exc_info=True,
)
return {
"status": "exception",
"status_code": 403,
"message": (
f"Tool guard encountered an internal error for '{function_name}' "
"and the call has been blocked as a safety precaution."
),
"error_type": "ToolGuardInternalError",
"function_name": function_name,
}Note on
tool_guard_runtime.py's ownexcept Exceptionblock: The runtime's internal handler (lines ~582–590) already returns a blocking error string for unexpected exceptions, so it already fails closed — no change needed there.
🧠 Learnings used
Learnt from: haroldship
Repo: cuga-project/cuga-agent PR: 67
File: src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py:377-397
Timestamp: 2026-03-23T08:37:20.455Z
Learning: In `src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py`, when `PromptUtils.find_tools` raises an `OutputParserException` or other exception (i.e., the shortlister LLM returns malformed/empty JSON), the correct behaviour is to return a short user-facing error string to the agent — NOT a fallback tool list. This was an explicit design decision: returning all tools on LLM failure bloats the agent context and masks bugs; the error string lets the agent LLM decide whether to retry or proceed. This is distinct from the soft-filtering fallback (which applies when `filter_criteria` is provided but no tools match the criteria).
Learnt from: offerakrabi
Repo: cuga-project/cuga-agent PR: 61
File: src/cuga/backend/cuga_graph/nodes/cuga_lite/combined_tool_provider.py:232-232
Timestamp: 2026-03-16T09:35:34.380Z
Learning: In `src/cuga/backend/cuga_graph/nodes/cuga_lite/combined_tool_provider.py`, `settings.advanced_features.registry` access in `CombinedToolProvider.initialize()` can fail in CI test environments if Dynaconf validators haven't been applied yet. The validator with `default=True` is declared in `config.py` (line 117) and `validate_all()` is called at module level (line 264). The preferred defensive fix is `getattr(settings.advanced_features, 'registry', True)` consistent with patterns used elsewhere in the codebase (e.g., `helpers.py`).
Learnt from: sami-marreed
Repo: cuga-project/cuga-agent PR: 121
File: src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py:229-247
Timestamp: 2026-04-05T20:51:54.199Z
Learning: In `src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py`, `format_task_todos_system_block` intentionally injects todo items verbatim into the system prompt so the agent treats them as authoritative, high-priority task directives it must follow. This is a deliberate design choice — do NOT flag raw todo injection as a prompt-injection risk in future reviews. Todos are generated by the agent's own `create_update_todos` tool, not from raw user input, so the injection risk is considered acceptable by the team.
- 📌 Commit the changes to this branch
There was a problem hiding this comment.
Actionable comments posted: 7
♻️ Duplicate comments (2)
src/cuga/sdk.py (1)
1358-1364:⚠️ Potential issue | 🟡 MinorFix the
update_tool_guarddocstring example.The example still uses the old
target_tool/violating_examples/compliance_examplessignature, butupdate_tool_guard()now accepts a singletool_guardsmapping. Copy-pasting this snippet will raiseTypeError.📝 Proposed docstring fix
# Update policy with examples await agent.policies.update_tool_guard( policy_id=policy_id, - target_tool="delete_file", - violating_examples=violating, - compliance_examples=compliance + tool_guards={ + "delete_file": { + "description": "Guard rules for file deletion", + "violating_examples": violating, + "compliance_examples": compliance, + "policy_code": "", + } + } )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cuga/sdk.py` around lines 1358 - 1364, The docstring example for agent.policies.update_tool_guard is using the old keyword args (target_tool, violating_examples, compliance_examples) and will raise a TypeError; update the example to call update_tool_guard with a single tool_guards mapping (e.g. call agent.policies.update_tool_guard(policy_id=policy_id, tool_guards={"delete_file": {"violating_examples": violating, "compliance_examples": compliance}}) so the example matches the new signature exposed by update_tool_guard.src/cuga/backend/cuga_graph/policy/tool_guard/tool_guard_runtime.py (1)
570-575:⚠️ Potential issue | 🟠 MajorDon't clobber a real
argsparameter.
arguments | {"args": args_obj}overwrites any user-suppliedargsfield before validation, so those tools are guarded against synthetic data instead of the real payload. This is the same collision that was flagged earlier and it still exists here.Verification: this should print
overwritten: True.#!/bin/bash python - <<'PY' from types import SimpleNamespace arguments = {"args": "user-supplied", "value": 1} args_obj = SimpleNamespace(**arguments) merged = arguments | {"args": args_obj} print("original args:", arguments["args"]) print("merged args type:", type(merged["args"]).__name__) print("overwritten:", merged["args"] != arguments["args"]) PY🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cuga/backend/cuga_graph/policy/tool_guard/tool_guard_runtime.py` around lines 570 - 575, The call to self._runtime.guard_toolcall currently forces arguments | {"args": args_obj}, which clobbers any user-supplied "args" key; change the merge so we do not overwrite an existing "args" field—either pass the SimpleNamespace under a distinct key (e.g. "_args" or "args_obj") or only inject "args" when "args" is not already present (check arguments for "args" before merging); update the call site in the guard_toolcall invocation (the arguments passed to self._runtime.guard_toolcall) to use the non-colliding key or conditional merge while keeping function_name, args_obj/SimpleNamespace creation, and delegate=self.invoker intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/cuga/backend/cuga_graph/policy/tool_guard/tool_guard_runtime.py`:
- Around line 497-526: The current _normalize_name is lossy and can map
different tool names to the same identifier; change the normalization strategy
used by _normalize_name (and therefore used by _guard_module_name_for_tool and
_guard_function_name_for_tool) to append a short deterministic disambiguator
derived from the original name (e.g., a truncated hash like sha1/sha256 of the
original name, 6–8 hex chars) after the safe normalized token, ensuring you keep
the readable normalized portion plus the suffix (and still default to "tool" if
empty); update any code that expects the previous plain names (notably
_guard_module_name_for_tool, _guard_function_name_for_tool, and places in
_build_runtime that generate/write module files) so they use the new stable
unique names and avoid overwriting/loading the wrong guard.
In `@src/cuga/backend/tools_env/registry/registry/api_registry_server.py`:
- Around line 218-219: When shutting down or reloading in YAML mode you must
also tear down the old registry's policy storage: in the same places where you
check database_mode and call await mcp_manager.shutdown() (the shutdown block
and the reload block that overwrites the global registry), detect if "registry"
is present in globals() and if registry.policy_storage exists and await its
teardown (preferably calling await registry.policy_storage.shutdown() and
falling back to await registry.policy_storage.close() if shutdown() is not
implemented), then remove/replace the global registry reference; this ensures
the YAML policy backend connection is closed on shutdown and reload.
- Around line 160-161: The ApiRegistry startup opens policy_storage then calls
await reg.start_servers(), but if start_servers (or subsequent tool
loading/ToolGuard init) raises the opened PolicyStorage is leaked; wrap the
ApiRegistry construction and startup in a try/except/finally: after creating
policy_storage and reg, attempt await reg.start_servers() inside try, on
exception ensure policy_storage.disconnect() (or close) is awaited before
re-raising the exception; apply the same pattern to the YAML-mode startup and
the reload paths so any early exceptions always close the newly opened
PolicyStorage.
In `@src/cuga/backend/tools_env/registry/registry/api_registry.py`:
- Around line 231-233: The call_function() logic is incorrectly unwrapping any
dict containing a 'params' key, which changes the public payload seen by guards
and tools (see ApiRegistry.call_function and the local variable unwrapped_args);
revert this by removing the unconditional unwrapping from
ApiRegistry.call_function and instead perform transport-wrapper normalization at
the request boundary/handler where the wrapper contract is explicit (e.g., the
HTTP/transport deserializer that receives the API payload), or add a strict
discriminator (a specific wrapper key or type tag) before unwrapping so only
true transport wrappers are normalized and real tool payloads with a params
field are left intact.
- Around line 87-110: The current except block swallows ToolGuardRuntime
initialization errors by setting self.tool_guard_runtime = None which lets
call_function() skip all policy validation; instead, make initialization failure
fatal: in the except for the ToolGuardRuntime
import/initialize/get_guarded_tools block, either re-raise the caught exception
or raise a new RuntimeError with context (e.g. "ToolGuardRuntime failed to
initialize") so service startup fails, and remove the fallback to None; ensure
callers (call_function()) rely on the presence of self.tool_guard_runtime rather
than treating None as "no guards" so policy validation cannot be silently
bypassed.
In `@src/cuga/sdk.py`:
- Around line 1230-1235: The exception branch now returns a dict with keys
"loaded", "removed", "errors", and "files", but the two early return paths in
the same function (the branches that short-circuit for disabled policy or
uninitialized filesystem sync) still omit "files"; update those early returns to
return the same stable payload shape — include "files": [] along with "loaded":
0, "removed": 0 and "errors": [] (or existing error value) so every return from
the function uses the same keys and types.
- Around line 592-598: The loop that builds tool_guards_obj accepts any keys
from tool_guards and can register invalid guard names; validate that each
tool_name exists in the target_tools set/list before creating a ToolGuard (e.g.,
compute invalid = set(tool_guards.keys()) - set(target_tools)), and if any
invalid names exist either raise a ValueError listing them or log and skip them;
update the population logic around tool_guards_obj and references to
tool_guards/target_tools/ToolGuard so typos are caught early and only valid tool
names are registered.
---
Duplicate comments:
In `@src/cuga/backend/cuga_graph/policy/tool_guard/tool_guard_runtime.py`:
- Around line 570-575: The call to self._runtime.guard_toolcall currently forces
arguments | {"args": args_obj}, which clobbers any user-supplied "args" key;
change the merge so we do not overwrite an existing "args" field—either pass the
SimpleNamespace under a distinct key (e.g. "_args" or "args_obj") or only inject
"args" when "args" is not already present (check arguments for "args" before
merging); update the call site in the guard_toolcall invocation (the arguments
passed to self._runtime.guard_toolcall) to use the non-colliding key or
conditional merge while keeping function_name, args_obj/SimpleNamespace
creation, and delegate=self.invoker intact.
In `@src/cuga/sdk.py`:
- Around line 1358-1364: The docstring example for
agent.policies.update_tool_guard is using the old keyword args (target_tool,
violating_examples, compliance_examples) and will raise a TypeError; update the
example to call update_tool_guard with a single tool_guards mapping (e.g. call
agent.policies.update_tool_guard(policy_id=policy_id,
tool_guards={"delete_file": {"violating_examples": violating,
"compliance_examples": compliance}}) so the example matches the new signature
exposed by update_tool_guard.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 41305a49-4d15-4e5a-a5bc-293593eae091
📒 Files selected for processing (4)
src/cuga/backend/cuga_graph/policy/tool_guard/tool_guard_runtime.pysrc/cuga/backend/tools_env/registry/registry/api_registry.pysrc/cuga/backend/tools_env/registry/registry/api_registry_server.pysrc/cuga/sdk.py
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (1)
src/cuga/sdk.py (1)
591-599:⚠️ Potential issue | 🟠 MajorHonor wildcard ToolGuide targets here.
ToolGuide.target_toolsexplicitly allows["*"], but Lines 591-599 treat"*"as a literal tool name. That makesupdate_tool_guard()unusable for wildcard policies, because every real tool key is rejected as invalid.🛠️ Proposed fix
# Validate that tool_guards keys are in target_tools target_tools_set = set(existing_policy.target_tools or []) - invalid_tools = set(tool_guards.keys()) - target_tools_set + wildcard = "*" in target_tools_set + invalid_tools = set() if wildcard else set(tool_guards.keys()) - target_tools_set🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cuga/sdk.py` around lines 591 - 599, The validation incorrectly treats a "*" wildcard as a literal tool name; in the block that computes target_tools_set from existing_policy.target_tools and then computes invalid_tools = set(tool_guards.keys()) - target_tools_set, change the logic to honor the wildcard: if existing_policy.target_tools contains "*" (e.g., target_tools_set == {"*"} or "*" in existing_policy.target_tools) skip the subtraction/validation or treat all tool_guards keys as allowed so invalid_tools is empty; adjust the branch that raises ValueError accordingly so update_tool_guard() accepts tool_guards when the policy target is a wildcard.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/cuga/backend/cuga_graph/policy/tool_guard/tool_guard_runtime.py`:
- Around line 81-83: The runtime currently builds a single global domain/runtime
using _load_runtime_domain and _build_runtime and uses it in guard_tool_call,
which ignores app_name and causes cross-app collisions; update the
implementation to maintain a mapping keyed by (app_name, tool_name) (or at
minimum per-app runtimes) instead of single self._runtime and
self._runtime_domain: change tool_to_guards lookups and guard_tool_call to
accept and use app_name when selecting the runtime/guards, add a cache like
self._runtimes_by_app (or self._runtimes_by_app_tool) that calls
_load_runtime_domain/_build_runtime per app (or per (app,tool)) when missing,
and update all places noted (including the blocks around lines 207-228 and
538-585) to read from this keyed mapping so each app uses its own
domain/runtime.
- Around line 123-137: The code currently registers a policy as soon as
policy.policy_code is non-empty which can leave a tool marked "guarded" even if
no async guard function exists; change the registration loop in
tool_guard_runtime (the block iterating policy.tool_guards and appending to
self.tool_to_guards) to validate the policy code by invoking the existing
_process_policy_guard() (or the same validation logic used by
_generate_module_content()) and ensure at least one async def guard_* callable
is present before appending policy to self.tool_to_guards; if validation fails,
do not register the policy for that tool and either log an error and skip or
raise an initialization error to fail-closed (choose consistent behavior used
elsewhere), and mirror this validation in the other similar registration sites
noted (around the other occurrences of the same registration logic).
In `@src/cuga/backend/tools_env/registry/registry/api_registry_server.py`:
- Around line 160-170: When startup aborts after ApiRegistry(client=manager) and
await reg.start_servers() fails, you currently only disconnect policy_storage;
also ensure the MCP manager(s) are shut down: after catching the exception,
await shutdown of the manager and any nested MCP managers (e.g., call await
manager.shutdown() if available, otherwise if manager has attributes mcp_manager
or new_manager call await manager.mcp_manager.shutdown()/await
manager.new_manager.shutdown() or their .disconnect()/.stop() equivalents),
wrapping each call in its own try/except and logging warnings similar to the
policy_storage cleanup; apply the same pattern for the other two failure
handlers referenced (the blocks around lines 213-222 and 611-621).
In `@src/cuga/backend/tools_env/registry/registry/api_registry.py`:
- Around line 239-246: Tool guards are being applied globally because
ToolGuardRuntime only indexes by function_name; update the guard lookup to also
respect the app scope by filtering guards against ToolGuide.target_apps (or an
empty/None meaning all apps) before enforcing them. Specifically, modify
ToolGuardRuntime.guard_tool_call (and any internal guard index/lookup) to
consider the incoming app_name parameter and only evaluate guards whose
ToolGuide.target_apps contains that app_name (or is unset), so a guard for
"delete" on one app does not block "delete" on other apps; ensure existing call
sites (e.g., api_registry.py calling tool_guard_runtime.guard_tool_call) remain
unchanged except for the corrected behavior.
---
Duplicate comments:
In `@src/cuga/sdk.py`:
- Around line 591-599: The validation incorrectly treats a "*" wildcard as a
literal tool name; in the block that computes target_tools_set from
existing_policy.target_tools and then computes invalid_tools =
set(tool_guards.keys()) - target_tools_set, change the logic to honor the
wildcard: if existing_policy.target_tools contains "*" (e.g., target_tools_set
== {"*"} or "*" in existing_policy.target_tools) skip the subtraction/validation
or treat all tool_guards keys as allowed so invalid_tools is empty; adjust the
branch that raises ValueError accordingly so update_tool_guard() accepts
tool_guards when the policy target is a wildcard.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 82ee4859-4bfb-47d0-ba9d-a1e9ca4a8838
📒 Files selected for processing (4)
src/cuga/backend/cuga_graph/policy/tool_guard/tool_guard_runtime.pysrc/cuga/backend/tools_env/registry/registry/api_registry.pysrc/cuga/backend/tools_env/registry/registry/api_registry_server.pysrc/cuga/sdk.py
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (1)
src/cuga/backend/cuga_graph/policy/tool_guard/tool_guard_runtime.py (1)
698-703:⚠️ Potential issue | 🟠 MajorPreserve a real tool argument named
args.
arguments | {"args": args_obj}still overwrites any user-suppliedarguments["args"], so guards can validate a different payload than the tool will actually receive.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cuga/backend/cuga_graph/policy/tool_guard/tool_guard_runtime.py` around lines 698 - 703, The guard currently overwrites any user-supplied arguments["args"] by passing arguments | {"args": args_obj} to runtime.guard_toolcall; change this to preserve a real "args" when present (e.g., if "args" in arguments then pass arguments unchanged, otherwise pass arguments | {"args": args_obj}) so the guard validates the exact payload the tool will receive; update the call to runtime.guard_toolcall (using function_name, arguments, args_obj, self.invoker) accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/cuga/backend/cuga_graph/policy/tool_guard/tool_guard_runtime.py`:
- Around line 632-639: The code currently caches None on runtime initialization
failure in _runtimes_by_app (tool_guard_runtime.py), which causes
guard_tool_call() to treat that as "skip validation" and disables enforcement;
instead, change the failure path (both the except block around app runtime
bootstrap and the similar block at 684-691) to not store None but store a clear
failure sentinel (e.g., a unique RuntimeInitFailed object or an exception
instance) or re-raise the error so callers can distinguish failure from "not
initialized", and update guard_tool_call() to check _runtimes_by_app for that
sentinel/exception and treat it as a hard failure (deny or block the tool call)
rather than bypassing enforcement.
- Around line 616-628: Concurrent cold-starts can race because the snippet
checks self._runtimes_by_app, then builds and __enter__()s runtimes without
synchronization, causing duplicate expensive builds and leaked entered runtimes;
serialize initialization per app by introducing a per-app lock map (e.g.,
self._runtime_init_locks_by_app) and acquire the lock before re-checking the
cache and calling self._load_runtime_domain and self._build_runtime (and
__enter__()); ensure you create the lock lazily for the app, use a context
manager to release reliably, and only assign into self._runtimes_by_app and
self._runtime_domains_by_app after the build succeeds so only one built/entered
runtime is kept.
In `@src/cuga/backend/tools_env/registry/registry/api_registry_server.py`:
- Around line 239-256: The current cleanup sequence awaits mgr.shutdown() before
attempting reg.policy_storage.disconnect(), which aborts disconnects on shutdown
exceptions; create a shared async helper (e.g. async def
cleanup_registry_resources(agent_id, mgr, reg)) that performs both actions
independently and catches/logs exceptions for each (try/except around await
mgr.shutdown() and separately around await reg.policy_storage.disconnect(),
ensuring both are attempted regardless of the other failing), then replace the
inline shutdown+disconnect sequences in api_registry_server (the loop using
agent_registries and the other occurrences at the lifespan/reload/cache-clear
paths referenced in the comment) to call this helper for each registry and for
the global registry/mcp_manager cleanup so all resources are best-effort cleaned
up.
- Around line 623-649: After new_registry.start_servers() succeeds, the
subsequent old-state teardown can fail and currently does not roll back the
newly started resources; modify the post-start cleanup so any exception while
tearing down the old registry (e.g. during registry.policy_storage.disconnect()
or mcp_manager.shutdown()) triggers a rollback that cleanly shuts down the newly
created resources: call new_registry.shutdown() (or a stop/teardown method on
ApiRegistry if named differently), await new_manager.shutdown(), and if
policy_storage is not None await policy_storage.disconnect() inside an exception
handler that re-raises the original error; surround the old cleanup block
(registry.policy_storage.disconnect() and mcp_manager.shutdown()) with
try/except and perform this rollback in the except path to ensure
new_manager/new_registry/policy_storage are always torn down on failure.
---
Duplicate comments:
In `@src/cuga/backend/cuga_graph/policy/tool_guard/tool_guard_runtime.py`:
- Around line 698-703: The guard currently overwrites any user-supplied
arguments["args"] by passing arguments | {"args": args_obj} to
runtime.guard_toolcall; change this to preserve a real "args" when present
(e.g., if "args" in arguments then pass arguments unchanged, otherwise pass
arguments | {"args": args_obj}) so the guard validates the exact payload the
tool will receive; update the call to runtime.guard_toolcall (using
function_name, arguments, args_obj, self.invoker) accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 013486f0-7d00-4b57-9b37-728215fa6162
📒 Files selected for processing (2)
src/cuga/backend/cuga_graph/policy/tool_guard/tool_guard_runtime.pysrc/cuga/backend/tools_env/registry/registry/api_registry_server.py
| # Return cached runtime if available | ||
| if app_name in self._runtimes_by_app: | ||
| return self._runtimes_by_app[app_name] | ||
|
|
||
| # Try to load and build runtime for this app | ||
| try: | ||
| logger.info(f"Loading runtime domain for app '{app_name}'...") | ||
| runtime_domain = self._load_runtime_domain(app_name) | ||
| self._runtime_domains_by_app[app_name] = runtime_domain | ||
|
|
||
| logger.info(f"Building runtime for app '{app_name}'...") | ||
| runtime = self._build_runtime(app_name) | ||
| self._runtimes_by_app[app_name] = runtime |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Serialize per-app runtime initialization.
Two concurrent requests for a cold app_name can both miss the cache, both build a runtime, and both call __enter__(). The later assignment wins, so one entered runtime is leaked and the expensive initialization work is duplicated.
Possible fix
+import asyncio
...
class ToolGuardRuntime:
def __init__(self, tool_provider, policy_storage: PolicyStorage) -> None:
...
self._runtimes_by_app: Dict[str, Any] = {}
self._runtime_domains_by_app: Dict[str, RuntimeDomain] = {}
+ self._runtime_init_locks: Dict[str, asyncio.Lock] = {}
self._initialized = False
...
async def _get_or_create_runtime_for_app(self, app_name: str):
- if app_name in self._runtimes_by_app:
- return self._runtimes_by_app[app_name]
+ if app_name in self._runtimes_by_app:
+ return self._runtimes_by_app[app_name]
+
+ lock = self._runtime_init_locks.setdefault(app_name, asyncio.Lock())
+ async with lock:
+ if app_name in self._runtimes_by_app:
+ return self._runtimes_by_app[app_name]
- # Try to load and build runtime for this app
- try:
- logger.info(f"Loading runtime domain for app '{app_name}'...")
- runtime_domain = self._load_runtime_domain(app_name)
- self._runtime_domains_by_app[app_name] = runtime_domain
+ # Try to load and build runtime for this app
+ try:
+ logger.info(f"Loading runtime domain for app '{app_name}'...")
+ runtime_domain = self._load_runtime_domain(app_name)
+ self._runtime_domains_by_app[app_name] = runtime_domain
- logger.info(f"Building runtime for app '{app_name}'...")
- runtime = self._build_runtime(app_name)
- self._runtimes_by_app[app_name] = runtime
+ logger.info(f"Building runtime for app '{app_name}'...")
+ runtime = self._build_runtime(app_name)
+ self._runtimes_by_app[app_name] = runtime
- logger.info(f"✅ Runtime initialized for app '{app_name}'")
- return runtime
- except Exception as e:
- ...
+ logger.info(f"✅ Runtime initialized for app '{app_name}'")
+ return runtime
+ except Exception as e:
+ ...🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/cuga/backend/cuga_graph/policy/tool_guard/tool_guard_runtime.py` around
lines 616 - 628, Concurrent cold-starts can race because the snippet checks
self._runtimes_by_app, then builds and __enter__()s runtimes without
synchronization, causing duplicate expensive builds and leaked entered runtimes;
serialize initialization per app by introducing a per-app lock map (e.g.,
self._runtime_init_locks_by_app) and acquire the lock before re-checking the
cache and calling self._load_runtime_domain and self._build_runtime (and
__enter__()); ensure you create the lock lazily for the app, use a context
manager to release reliably, and only assign into self._runtimes_by_app and
self._runtime_domains_by_app after the build succeeds so only one built/entered
runtime is kept.
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/cuga/backend/cuga_graph/policy/tool_guard/manager.py`:
- Around line 224-228: The current except block in the async example-generation
code (the block that logs "❌ Failed to generate examples for tool
'{target_tool}'" and raises RuntimeError) incorrectly catches
asyncio.CancelledError; change the handler in the async generation methods (the
functions containing that logger/error raise — e.g., the tool example generation
functions around the shown block and the similar block at lines 343-347) so that
CancelledError is allowed to propagate: import asyncio and either add an
explicit except asyncio.CancelledError: raise or check isinstance(e,
asyncio.CancelledError) and re-raise before logging/wrapping, then log and wrap
only other exceptions.
- Around line 234-235: The parameter app_name (declared as app_name: str =
"cuga_app") must be constrained before any filesystem-backed generation: add a
small validator helper (e.g., validate_app_name or sanitize_app_name) and call
it at the start of the methods that accept app_name (the function with signature
ending in "-> str" shown and the other method around lines 297-303) to either
reject or normalize unsafe values; specifically disallow path separators ('/',
'\\'), traversal segments ('..'), and any characters outside a safe whitelist
such as /^[A-Za-z0-9_-]+$/, and raise a ValueError (or return the sanitized
identifier) if validation fails so no unsafe path or traversal can be introduced
into persisted artifacts.
- Around line 42-56: Add upfront validation for agent.tool_provider and
agent.cuga_folder in the ToolGuardManager initialization: check that
agent.tool_provider is not None and agent.cuga_folder is not None/empty and
raise a clear ValueError if missing, before wrapping agent._model; then use the
existing Path logic (work_dir_path = Path(agent.cuga_folder) / "toolguard") to
create the directory (mkdir(parents=True, exist_ok=True)) and assign
self.work_dir, ensuring the error messages reference tool_provider and
cuga_folder so failures surface early and clearly.
In `@src/cuga/backend/cuga_graph/policy/tool_guard/tool_invoker.py`:
- Around line 108-112: The except block in tool_invoker.py that logs and wraps
all Exceptions (catching Exception as e) unintentionally swallows
asyncio.CancelledError; update the handler in the invoke path (the block
referencing logger and toolname) to detect and re-raise asyncio.CancelledError
immediately (e.g., if isinstance(e, asyncio.CancelledError): raise) before
logging/wrapping, and ensure asyncio is imported so cancellation is preserved
while other exceptions are still logged and wrapped into the RuntimeError.
- Around line 49-50: When populating self._tools_cache from tools_list returned
by self.tool_provider.get_all_tools(), the dict comprehension {tool.name: tool
for tool in tools_list} silently overwrites duplicate tool names; change this to
fail fast by scanning tools_list for duplicate tool.name values before assigning
self._tools_cache (e.g., build a map while checking if a name already exists and
raise a clear exception or ValueError mentioning the duplicate), so that any
collisions across providers are detected and prevented from routing guards to
the wrong tool.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 43b2aec2-6053-45ae-b837-32394f152cad
📒 Files selected for processing (3)
pyproject.tomlsrc/cuga/backend/cuga_graph/policy/tool_guard/manager.pysrc/cuga/backend/cuga_graph/policy/tool_guard/tool_invoker.py
✅ Files skipped from review due to trivial changes (1)
- pyproject.toml
| self.work_dir = str(work_dir_path) | ||
| logger.debug(f"ToolGuard working directory: {self.work_dir}") | ||
|
|
||
| async def initialize( |
There was a problem hiding this comment.
[MED-2] Potential race condition in ToolGuardManager initialization
src/cuga/backend/cuga_graph/policy/tool_guard/manager.py:85
Explanation
The initialize method doesn't check self._initialized before initializing, and has no synchronization mechanism:
async def initialize(self) -> None:
logger.info("Initializing ToolGuardManager...")
# Get all LangChain tools from the tool provider
self.langchain_tools = await self.tool_provider.get_all_tools()
# Convert LangChain tools to OpenAPI dict
self.tools_dict = langchain_tools_to_openapi(self.langchain_tools)
self._initialized = TrueIf initialize() is called multiple times concurrently, all calls will execute the full initialization logic, potentially causing:
- Duplicate/wasted work fetching tools multiple times
- Race conditions where
self.langchain_toolsandself.tools_dictare overwritten mid-initialization - Inconsistent state if one initialization fails while another succeeds
Suggested Fix
Use an async lock to ensure thread-safe initialization:
import asyncio
class ToolGuardManager:
def __init__(self, agent):
self._agent = agent
self._initialized = False
self._init_lock = asyncio.Lock()
async def initialize(self):
async with self._init_lock:
if self._initialized:
return
# ... initialization logic
self._initialized = TrueAgent Prompt
In src/cuga/backend/cuga_graph/policy/tool_guard/manager.py around line 85, the initialize method has a potential race condition where multiple concurrent calls could bypass the _initialized check. Review the implementation and add an asyncio.Lock to ensure thread-safe initialization.
| except Exception as e: | ||
| logger.error(f"Failed to connect policy storage: {e}") | ||
| # Fail closed: if policy enforcement is enabled but storage fails, | ||
| # don't boot the registry unguarded | ||
| raise RuntimeError( | ||
| f"Policy system is enabled but PolicyStorage.connect() failed: {e}" |
There was a problem hiding this comment.
[MED-5] Inconsistent error handling for policy storage connection failures
src/cuga/backend/tools_env/registry/registry/api_registry_server.py:155
Explanation
The code has inconsistent error handling for policy storage connection failures. In _get_or_create_registry, it raises a RuntimeError (fail-closed):
except Exception as e:
logger.error(f"Failed to connect policy storage: {e}")
raise RuntimeError(
f"Policy system is enabled but PolicyStorage.connect() failed: {e}"
) from eBut in the lifespan function, it only logs a warning and continues (fail-open):
except Exception as e:
logger.warning(f"Failed to connect policy storage: {e}")This inconsistency means that in some code paths, policy enforcement can be silently disabled, which is a security risk.
Suggested Fix
Make error handling consistent - fail closed when policy system is enabled:
# In lifespan function
if settings.policy.enabled:
try:
from cuga.backend.cuga_graph.policy.storage import PolicyStorage
policy_storage = PolicyStorage()
await policy_storage.connect()
logger.info("✅ Connected policy storage for ToolGuardRuntime")
except Exception as e:
logger.error(f"Failed to connect policy storage: {e}")
# Fail closed: if policy enforcement is enabled but storage fails,
# don't boot the registry unguarded
raise RuntimeError(
f"Policy system is enabled but PolicyStorage.connect() failed: {e}"
) from eAgent Prompt
In src/cuga/backend/tools_env/registry/registry/api_registry_server.py around line 155, the error handling for policy storage connection failures is inconsistent between _get_or_create_registry (fail-closed) and lifespan (fail-open). Review both code paths and ensure consistent fail-closed behavior when the policy system is enabled.
| """ | ||
|
|
||
| def __init__(self, client: MCPManager): | ||
| def __init__(self, client: MCPManager, policy_storage=None): |
There was a problem hiding this comment.
[MED-6] Policy storage lifecycle management should be in ToolGuardRuntime, not registry
src/cuga/backend/tools_env/registry/registry/api_registry_server.py:140
Explanation
The PR adds ~74 lines of policy storage initialization and cleanup code to api_registry_server.py, scattered across multiple functions (_get_or_create_registry, lifespan, reload_config, clear_agent_cache). This makes the registry harder to read and violates single responsibility principle.
The registry's job is to manage and execute tools. Policy enforcement is a separate concern that should be encapsulated in the ToolGuardRuntime class, which already exists and is the component that actually uses the policy storage.
Current approach:
# In api_registry_server.py - registry manages policy storage lifecycle
policy_storage = PolicyStorage()
await policy_storage.connect()
registry = ApiRegistry(client=manager, policy_storage=policy_storage)
# Later, in cleanup
if reg.policy_storage is not None:
await reg.policy_storage.disconnect()This pattern is repeated in 4+ different places in the file.
Suggested Fix
Move policy storage lifecycle into ToolGuardRuntime:
# In tool_guard_runtime.py
class ToolGuardRuntime:
def __init__(self, tool_provider, enable_policies: bool = False):
self.tool_provider = tool_provider
self.policy_storage = None
self.enable_policies = enable_policies
async def initialize(self):
if self.enable_policies:
from cuga.backend.cuga_graph.policy.storage import PolicyStorage
self.policy_storage = PolicyStorage()
await self.policy_storage.connect()
# ... load policies from storage
async def cleanup(self):
if self.policy_storage:
await self.policy_storage.disconnect()
self.policy_storage = NoneThen in registry:
# In api_registry.py - much simpler
class ApiRegistry:
def __init__(self, client: MCPManager, enable_policies: bool = False):
self.mcp_client = client
self.guard_runtime = ToolGuardRuntime(self.tool_provider, enable_policies)
async def start_servers(self):
await self.mcp_client.start_servers()
await self.guard_runtime.initialize()
async def cleanup(self):
await self.guard_runtime.cleanup()This removes ~50+ lines from api_registry_server.py and keeps all policy-related code in the policy module.
Agent Prompt
In src/cuga/backend/tools_env/registry/registry/api_registry_server.py around line 140, policy storage initialization and cleanup is scattered across multiple functions. Review the architecture and refactor to move policy storage lifecycle management into ToolGuardRuntime class, which is the component that actually uses it. This will simplify the registry code and improve separation of concerns.
Summary by CodeRabbit
New Features
Tests
Chores