Skip to content

feat: sandbox-aware execution and Carbon side panel refresh#225

Open
sami-marreed wants to merge 6 commits into
mainfrom
feat/carbonize-chat-side-panel
Open

feat: sandbox-aware execution and Carbon side panel refresh#225
sami-marreed wants to merge 6 commits into
mainfrom
feat/carbonize-chat-side-panel

Conversation

@sami-marreed
Copy link
Copy Markdown
Contributor

@sami-marreed sami-marreed commented May 14, 2026

Feature

Related issues

Summary

Two coherent themes land together in this PR:

1. Sandbox-aware execution

The executor now matches its working directory and prompt guidance to the active sandbox_workspace, eliminating the recurring "is the CWD . or /workspace?" confusion.

  • code_executor chdirs Python execution into the per-thread workspace for local/native sandbox modes so generated code shares a CWD with run_command.
  • code_wrapper threads workspace_root through wrap_code so the prepended chdir lives inside the wrapped async main.
  • cuga_lite_graph passes sandbox_workspace into the reflection prompt context so strategic guidance matches the executor's actual CWD.
  • reflection_system, reflection_user, and mcp_prompt Jinja templates replace "explain every case" guidance with a single branch that renders concrete path rules for the actual sandbox_workspace value (., /workspace, or other). One workspace → one set of rules.

2. Carbon side panel refresh

ChatLanding's right-side agent panel now uses idiomatic Carbon primitives:

  • Accordion + AccordionItem for the app configuration section, with ContainedList for the per-app tool lists (replaces a custom expand/collapse implementation).
  • Layer wrapping the scrollable content area for proper Carbon elevation.
  • Non-clickable Tile for skill cards with truncated descriptions and a Show more / Show less expander (descriptions over 100 chars).
  • Inline-styled underline indicator on the section tabs that reliably tracks the active section without depending on border-bottom or pseudo-elements.

Plus an incidental fix in customSendMessage.ts: the thinkingInterval variable was scoped inside try, but cleared inside finally — now lifted to function scope. The half-finished thinking-indicator feature has been removed and will return in a follow-up PR with the streaming-display work.

Build / infra: postinstall patch for express's path-to-regexp v8 import shape, pinned webpack versions, refreshed pnpm-lock.yaml, regenerated dist/ bundle, and .claude/settings.local.json added to .gitignore.

Testing

  • Tested locally; UI changes verified in pnpm dev:real
  • Type checks pass on the modified frontend files
  • Pre-commit hooks (ruff, ruff format, conventional commits, detect-secrets) all pass

Summary by CodeRabbit

  • New Features

    • Per-thread workspace isolation for secure multi-user agent execution
    • New Skills panel in the agent UI with expandable skill details
  • UI

    • Agent panel navigation restyled to a 4-column, glassmorphism look
    • Tools list reworked into an accordion layout
  • Bug Fixes

    • Fixed frontend Express compatibility during install/build
  • Infrastructure

    • Standardized workspace handling (/workspace → cuga_workspace) and clearer sandbox workspace guidance in prompts and tooling

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 14, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds per-thread workspace scoping for filesystem MCP tools when skills are enabled, moves local/native executors to a cwd-rooted cuga_workspace with optional per-thread subdirs, wraps tool callables to rewrite relative paths, adds POSIX end-to-end tests for isolation/wrapping, and clamps policy confidence plus a small test tweak.

Changes

Per-Thread Filesystem Workspace Isolation

Layer / File(s) Summary
MCP filesystem tool wrapping and path rewriting
src/cuga/backend/cuga_graph/nodes/chat/chat_agent/mcp_filesystem_wrapper.py
Adds helpers to sanitize thread IDs, detect absolute paths, rewrite path args (path, source, destination, paths), provide function-level wrap_filesystem_tool_func, and produce StructuredTool wrappers that rewrite args on invoke when settings.skills.enabled is true.
Local executor workspace root and helpers
src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/local/local_sandbox_executor.py
Replaces tempfile-based roots with deterministic <cwd>/cuga_workspace base (CUGA_WORKSPACE_DIRNAME) and returns either the shared base or per-thread <safe_thread_id> subdirectory depending on settings.skills.enabled.
Native executor policy & venv hardening
src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/native/native_sandbox_executor.py
Resolves user-facing workspaces under <cwd>/cuga_workspace (optional per-thread), embeds that path into seatbelt policy allowlists, and tightens /tmp/.venv readiness checks with recreate-on-stale and explicit logging.
Local sandbox skills e2e tests
src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/tests/test_local_sandbox_skills.py
Adds POSIX-only tests that verify relative writes land in per-thread workspaces, threads are isolated, path traversal is rejected, filesystem tools round-trip, and both func-level and BaseTool wrappers rewrite or pass through paths appropriately when skills are on/off.

Policy clamp and execution-context tweaks

Layer / File(s) Summary
Policy confidence clamping & test adjustment
src/cuga/backend/cuga_graph/policy/agent.py, src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/tests/test_e2b_lite.py
Clamps returned PolicyMatch.confidence into [0.0,1.0] to avoid Pydantic validation failures; adjusts an e2b test to disable skills so injected _internal_re does not alter expected context locals.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • cuga-project/cuga-agent#121: Backend skills implementation with thread/workspace-scoped execution that directly enables the filesystem isolation in this PR.

Suggested reviewers

  • gjt-prog
  • nir-mashkif

🐰 I wrapped paths in nests of care,
Threads get their gardens, tidy and fair.
Tests hop along, policies clamped tight,
Workspaces bloom under the CWD light.
🥕✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.56% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: sandbox-aware execution and Carbon side panel refresh' directly and clearly describes the two main feature sets delivered in this PR: sandbox-aware execution changes and frontend Carbon UI refresh.
Linked Issues check ✅ Passed The PR successfully implements the main coding requirement from #227: surfacing skills in the chat frontend via a new SkillInfo display in ChatLanding with skill fetching, expandable descriptions, and integration into the right-panel section switcher.
Out of Scope Changes check ✅ Passed All changes align with stated objectives: sandbox-aware execution plumbing (workspace routing, path rewriting, executor updates), frontend Carbon panel refresh (ChatLanding skill display, Accordion layout), and supporting infrastructure (postinstall patch, MCP config updates, policy clamping).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/carbonize-chat-side-panel

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

Backend
- code_executor: chdir Python execution into the per-thread workspace
  for local/native sandbox modes so generated code shares a CWD with
  run_command (no more "/workspace/" hardcoded vs "." mismatch).
- code_wrapper: thread workspace_root through wrap_code so the
  prepended chdir lives inside the wrapped async main.
- cuga_lite_graph: pass sandbox_workspace into the reflection prompt
  context so strategic guidance matches the executor's actual CWD.

Prompts
- reflection_system, reflection_user, mcp_prompt: replace
  "explain every case" guidance with a Jinja branch that renders
  concrete path rules for the actual sandbox_workspace value
  (".", "/workspace", or other). One workspace -> one set of rules.

Frontend
- ChatLanding side panel: refactor to Carbon primitives -- Accordion
  + AccordionItem for app config, ContainedList for tools, Layer for
  elevation, non-clickable Tile for skill cards with expandable
  descriptions, and an inline-styled underline indicator that
  reliably tracks the active section tab.
- customSendMessage: fix thinkingInterval scope bug; remove the
  half-finished thinking-indicator feature (to reland in a follow-up
  with the streaming display work).
- api: register getSkills endpoint helper.

Build / infra
- frontend: add postinstall patch for express path-to-regexp v8
  import shape; pin webpack versions and refresh pnpm-lock.yaml;
  no-op setupMiddlewares hook in webpack.config.js.
- dist: regenerate frontend bundle.
- gitignore: ignore .claude/settings.local.json.
@sami-marreed sami-marreed force-pushed the feat/carbonize-chat-side-panel branch from 296d0ea to 6f4fbe2 Compare May 14, 2026 09:57
@sami-marreed sami-marreed requested a review from offerakrabi May 14, 2026 09:58
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Nitpick comments (2)
src/frontend_workspaces/frontend/scripts/postinstall.cjs (1)

6-7: ⚡ Quick win

Use require.resolve() to dynamically resolve the express module path instead of version-coupled hardcoded paths.

The current implementation has two brittle dependencies:

  • Line 6 assumes a standard ../node_modules/ layout
  • Lines 23-24 hardcode express@4.22.2, requiring manual updates whenever express is bumped

Replace both with require.resolve('express/lib/router/layer.js'), which will resolve to the correct path regardless of express version or installation layout:

Proposed change
-const layerFile = path.join(__dirname, '../node_modules/express/lib/router/layer.js');
+const layerFile = require.resolve('express/lib/router/layer.js');
 console.log('Looking for:', layerFile);
 if (fs.existsSync(layerFile)) {
   // ... patching logic ...
 } else {
   console.log('File not found at:', layerFile);
-  const altLayerFile = path.join(__dirname, '../../node_modules/.pnpm/express@4.22.2/node_modules/express/lib/router/layer.js');
-  console.log('Trying alternate path:', altLayerFile);
-  if (fs.existsSync(altLayerFile)) {
-    // ... duplicate patching logic ...
-  }
+  // No fallback needed; require.resolve handles all layouts
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/frontend_workspaces/frontend/scripts/postinstall.cjs` around lines 6 - 7,
Replace the brittle hardcoded path construction for the express layer with
Node's resolver: change the code that assigns layerFile (currently using
path.join(__dirname, '../node_modules/express/lib/router/layer.js')) to use
require.resolve('express/lib/router/layer.js') and likewise replace any other
hardcoded versioned express paths (the occurrences around the later block that
reference express@4.22.2) to use require.resolve with the same module path;
update the code that logs or uses layerFile to use the resolved path returned by
require.resolve.
src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/common/code_wrapper.py (1)

99-108: 💤 Low value

Redundant runtime check in generated code.

The if {repr(workspace_root)} else None condition in the generated code is unnecessary since this code path is only taken when workspace_root is truthy. The generated code could be simplified.

♻️ Suggested simplification
         if workspace_root:
-            chdir_line = (
-                f"    _internal_os.chdir({repr(workspace_root)}) if {repr(workspace_root)} else None\n"
-            )
+            chdir_line = f"    _internal_os.chdir({workspace_root!r})\n"

This also addresses the Ruff RUF010 hint by using the !r conversion flag instead of repr().

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/common/code_wrapper.py`
around lines 99 - 108, The generated chdir_line contains a redundant runtime
conditional and uses repr(), so update the code that builds chdir_line (in the
block that checks workspace_root) to emit a simple unconditional chdir call
since workspace_root is already truthy; replace the repr() usage with the
f-string !r conversion to satisfy the RUF010 hint (i.e., produce a string like
"_internal_os.chdir({workspace_root!r})" followed by newline) so wrapped_code
contains a direct chdir call when workspace_root is present.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/code_executor.py`:
- Around line 163-164: Replace the bare "except Exception: pass" in
code_executor.py (the try that attempts to determine and chdir to the workspace
root) with a logged debug-level catch so errors aren't silently swallowed;
capture and log the exception (including traceback or exception message) via the
module logger (e.g., logger.debug or processLogger.debug) and include context
like "unable to determine workspace root, continuing without chdir" so failures
in determining the workspace or import/config issues are visible during
debugging while preserving the continue behavior.

In `@src/frontend_workspaces/frontend/package.json`:
- Line 1: The scripts in package.json set DANGEROUSLY_DISABLE_HOST_CHECK but
webpack.config.js ignores it because allowedHosts is hardcoded to "all"; either
remove the env var from the scripts (e.g., the start/dev/dev:watch variants) or
update webpack.config.js to read process.env.DANGEROUSLY_DISABLE_HOST_CHECK and
conditionally set allowedHosts (e.g., if truthy set allowedHosts: "all", else
leave default/validate hosts). Locate the scripts section in package.json
(script names: start, start:win, dev, dev:real, dev:win, dev:watch,
dev:watch:win) and the allowedHosts setting in webpack.config.js (symbol:
allowedHosts) and implement one of the two fixes so the env var's presence
actually controls host validation.

In `@src/frontend_workspaces/frontend/scripts/postinstall.cjs`:
- Around line 18-21: The postinstall script currently only logs when a patch
target is missing; change the branches that log "Pattern not found in file" and
"File not found at:" (and the similar messages around lines 33-35) to terminate
with a non-zero exit (e.g., call process.exit(1) or throw an Error) so the
install fails fast when layerFile or the expected pattern isn't found; ensure
you update both occurrences that check for the file and for the pattern so
missing-patch situations do not silently exit 0.
- Around line 10-14: The current rewrite only renames the import to pathRegexp
but doesn't preserve the old API contract used by Express 4's layer.js: change
the patch so any call sites that do "this.regexp = pathRegexp(path, keys, opts)"
are rewritten to call the new shape and extract the RegExp and mutate keys
accordingly (assign this.regexp = result.regexp and push/merge result.keys into
the existing keys array) so match() continues to call this.regexp.exec() and
keys are populated; update the postinstall replacement logic that touches
pathRegexp, layer.js, and any occurrence of pathRegexp(path, keys, opts) to
perform this extraction and key-array mutation.

In `@src/frontend_workspaces/frontend/src/carbon-chat/customSendMessage.ts`:
- Line 229: The handler is emitting a single-space chunk (" ") which causes
empty assistant bubbles on early-return paths; instead either finalize the
stream by calling complete_item() before any early return (e.g., in
SuggestHumanActions, tool-approval flows, and Error paths) or stop emitting
whitespace placeholders altogether for reasoning-only updates—locate the
emit/send call that sets text: " " (also present at the other occurrences) and
change the logic to skip sending a whitespace chunk or call complete_item()
first so no blank assistant message is left behind.
- Around line 155-159: The code currently logs the raw user prompt via
console.log(`User message: ${userMessage}`) which can leak sensitive data;
update customSendMessage.ts to stop emitting the raw userMessage in production
by either removing that console.log or replacing it with safe metadata (e.g.,
log only threadId, useDraft, baseUrl, and a redacted/truncated preview or a
deterministic hash/length of userMessage). Ensure the change touches the logging
lines that reference userMessage (and keep the other logs for
threadId/useDraft/baseUrl) and gate any full-prompt output behind an explicit
dev-only check (NODE_ENV === "development") if you still need it while
debugging.
- Around line 198-201: The code reads the raw backend body into errorText and
embeds it into the thrown Error (response, errorText, throw new Error(...)),
which later gets surfaced to users; instead, log the full errorText to
console/debug for diagnostics but throw a sanitized, non-sensitive message
(e.g., `HTTP error: status ${response.status}` or `Server returned an error`)
without including errorText or raw HTML; keep the status code in the thrown
message for context and ensure any user-facing display uses that sanitized Error
instead of the raw backend body.

In `@src/frontend_workspaces/frontend/src/ChatLanding.tsx`:
- Around line 690-706: The skills fetch treats any non-OK/network error the same
as an empty list; add an explicit error state (e.g., skillsError via useState)
and set it to an Error or message inside the non-OK branch and catch block of
the existing useEffect that calls api.getSkills, while still clearing
setSkillsLoading in finally; update the component render logic that currently
shows the empty-state ("No skills installed" at the block around lines
1342-1354) to check skillsError first and render a failure message or toast when
skillsError is set (otherwise render the empty list message when skills is an
empty array), and keep existing setSkills calls unchanged for successful
responses.
- Around line 1357-1368: The skill cards use skill.name as the React key and as
the identifier for expansion state, causing collisions when names repeat; change
the identity to a stable unique composite string (for example
`${skill.source}:${skill.name}`) wherever skill.name is used as the key in the
skills.map rendering (the Tile key) and wherever expandedSkills is
consulted/updated (e.g., expandedSkills.has, toggle logic and any set
operations) so each skill is uniquely identified by source+name instead of name
alone.

---

Nitpick comments:
In
`@src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/common/code_wrapper.py`:
- Around line 99-108: The generated chdir_line contains a redundant runtime
conditional and uses repr(), so update the code that builds chdir_line (in the
block that checks workspace_root) to emit a simple unconditional chdir call
since workspace_root is already truthy; replace the repr() usage with the
f-string !r conversion to satisfy the RUF010 hint (i.e., produce a string like
"_internal_os.chdir({workspace_root!r})" followed by newline) so wrapped_code
contains a direct chdir call when workspace_root is present.

In `@src/frontend_workspaces/frontend/scripts/postinstall.cjs`:
- Around line 6-7: Replace the brittle hardcoded path construction for the
express layer with Node's resolver: change the code that assigns layerFile
(currently using path.join(__dirname,
'../node_modules/express/lib/router/layer.js')) to use
require.resolve('express/lib/router/layer.js') and likewise replace any other
hardcoded versioned express paths (the occurrences around the later block that
reference express@4.22.2) to use require.resolve with the same module path;
update the code that logs or uses layerFile to use the resolved path returned by
require.resolve.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c600dd06-6f47-45ae-ae35-463fed5aa106

📥 Commits

Reviewing files that changed from the base of the PR and between 81350e1 and 296d0ea.

⛔ Files ignored due to path filters (6)
  • src/cuga/frontend/dist/index.html is excluded by !**/dist/**
  • src/cuga/frontend/dist/main.76f3bae552155a4125f1.js is excluded by !**/dist/**
  • src/cuga/frontend/dist/main.f58b9c6b13e6284dbd97.js is excluded by !**/dist/**
  • src/cuga/frontend/dist/vendors.7268c94d86f8c5d3e2ed.js is excluded by !**/dist/**
  • src/cuga/frontend/dist/vendors.b6b2c62d8ecc895d3c3e.js is excluded by !**/dist/**
  • src/frontend_workspaces/pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (14)
  • .gitignore
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/code_executor.py
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/common/code_wrapper.py
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/prompts/mcp_prompt.jinja2
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/reflection/prompts/reflection_system.jinja2
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/reflection/prompts/reflection_user.jinja2
  • src/frontend_workspaces/frontend/package.json
  • src/frontend_workspaces/frontend/scripts/postinstall.cjs
  • src/frontend_workspaces/frontend/src/ChatLanding.css
  • src/frontend_workspaces/frontend/src/ChatLanding.tsx
  • src/frontend_workspaces/frontend/src/api.ts
  • src/frontend_workspaces/frontend/src/carbon-chat/customSendMessage.ts
  • src/frontend_workspaces/frontend/webpack.config.js

Comment thread src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/code_executor.py Outdated
@@ -1 +1 @@
{"name":"@carbon/ai-chat-examples-web-components-basic","version":"0.0.0","main":"main.js","private":true,"type":"module","proxy":"http://localhost:7860","description":"A basic example using @carbon/ai-chat as a web component.","scripts":{"start":"DANGEROUSLY_DISABLE_HOST_CHECK=true webpack serve --config ./webpack.config.js","start:win":"set DANGEROUSLY_DISABLE_HOST_CHECK=true&& webpack serve --config ./webpack.config.js","build":"webpack --config ./webpack.config.js","dev":"DANGEROUSLY_DISABLE_HOST_CHECK=true FAKE_STREAM=true webpack serve --config ./webpack.config.js --mode development","dev:real":"DANGEROUSLY_DISABLE_HOST_CHECK=true webpack serve --config ./webpack.config.js --mode development","dev:win":"set DANGEROUSLY_DISABLE_HOST_CHECK=true&& set FAKE_STREAM=true&& webpack serve --config ./webpack.config.js --mode development","dev:watch":"DANGEROUSLY_DISABLE_HOST_CHECK=true webpack serve --config ./webpack.config.js --mode development --watch","dev:watch:win":"set DANGEROUSLY_DISABLE_HOST_CHECK=true&& webpack serve --config ./webpack.config.js --mode development --watch","electron:build":"npm run build && electron-builder","electron":"electron .","desktop":"electron ./electron_loader/desktop_main.cjs"},"author":"IBM Corp","license":"Apache-2.0","browserslist":"> 0.5%, last 2 versions, not dead","dependencies":{"@carbon/ai-chat":"1.6.0","@carbon/ai-chat-components":"^0.8.0","@carbon/icons-react":"^11.45.0","@carbon/react":"^1.62.0","@ibm/plex":"^6.4.0","agentic_chat":"workspace:*","copy-webpack-plugin":"^13.0.0","lucide-react":"^0.525.0","react":"^18.2.0","react-dom":"^18.2.0","react-markdown":"^10.1.0","react-router-dom":"^7.13.0"},"devDependencies":{"@babel/core":"^7.26.0","@babel/plugin-proposal-class-properties":"^7.18.6","@babel/plugin-proposal-decorators":"^7.25.9","@babel/plugin-transform-private-methods":"^7.28.6","@babel/plugin-transform-runtime":"^7.25.9","@babel/preset-env":"^7.26.0","@babel/preset-react":"^7.25.9","@babel/preset-typescript":"^7.26.0","@types/react":"^18.2.0","@types/react-dom":"^18.2.0","babel-loader":"^9.2.1","css-loader":"^7.1.2","electron":"^41.2.0","electron-builder":"^26.0.12","html-webpack-plugin":"^5.6.3","sass":"^1.77.0","sass-loader":"^16.0.0","style-loader":"^4.0.0","terser-webpack-plugin":"^5.3.14","tinyglobby":"^0.2.14","ts-lit-plugin":"^2.0.2","ts-loader":"^9.5.1","typescript":"^5.6.3","webpack":"^5.104.1","webpack-bundle-analyzer":"^5.0.1","webpack-cli":"^5.1.4","webpack-dev-server":"^5.1.0"}} No newline at end of file
{"name":"@carbon/ai-chat-examples-web-components-basic","version":"0.0.0","main":"main.js","private":true,"type":"module","proxy":"http://localhost:7860","description":"A basic example using @carbon/ai-chat as a web component.","scripts":{"postinstall":"node scripts/postinstall.cjs","start":"DANGEROUSLY_DISABLE_HOST_CHECK=true webpack serve --config ./webpack.config.js","start:win":"set DANGEROUSLY_DISABLE_HOST_CHECK=true&& webpack serve --config ./webpack.config.js","build":"webpack --config ./webpack.config.js","dev":"DANGEROUSLY_DISABLE_HOST_CHECK=true FAKE_STREAM=true webpack serve --config ./webpack.config.js --mode development","dev:real":"DANGEROUSLY_DISABLE_HOST_CHECK=true webpack serve --config ./webpack.config.js --mode development","dev:win":"set DANGEROUSLY_DISABLE_HOST_CHECK=true&& set FAKE_STREAM=true&& webpack serve --config ./webpack.config.js --mode development","dev:watch":"DANGEROUSLY_DISABLE_HOST_CHECK=true webpack serve --config ./webpack.config.js --mode development --watch","dev:watch:win":"set DANGEROUSLY_DISABLE_HOST_CHECK=true&& webpack serve --config ./webpack.config.js --mode development --watch","electron:build":"npm run build && electron-builder","electron":"electron .","desktop":"electron ./electron_loader/desktop_main.cjs"},"author":"IBM Corp","license":"Apache-2.0","browserslist":"> 0.5%, last 2 versions, not dead","dependencies":{"@carbon/ai-chat":"1.6.0","@carbon/ai-chat-components":"^0.8.0","@carbon/icons-react":"^11.45.0","@carbon/react":"^1.62.0","@ibm/plex":"^6.4.0","agentic_chat":"workspace:*","copy-webpack-plugin":"^13.0.0","lucide-react":"^0.525.0","react":"^18.2.0","react-dom":"^18.2.0","react-markdown":"^10.1.0","react-router-dom":"^7.13.0"},"devDependencies":{"@babel/core":"^7.26.0","@babel/plugin-proposal-class-properties":"^7.18.6","@babel/plugin-proposal-decorators":"^7.25.9","@babel/plugin-transform-private-methods":"^7.28.6","@babel/plugin-transform-runtime":"^7.25.9","@babel/preset-env":"^7.26.0","@babel/preset-react":"^7.25.9","@babel/preset-typescript":"^7.26.0","@types/react":"^18.2.0","@types/react-dom":"^18.2.0","babel-loader":"^9.2.1","css-loader":"^7.1.2","electron":"^41.2.0","electron-builder":"^26.0.12","html-webpack-plugin":"^5.6.3","sass":"^1.77.0","sass-loader":"^16.0.0","style-loader":"^4.0.0","terser-webpack-plugin":"^5.3.14","tinyglobby":"^0.2.14","ts-lit-plugin":"^2.0.2","ts-loader":"^9.5.1","typescript":"^5.6.3","webpack":"5.104.1","webpack-bundle-analyzer":"^5.0.1","webpack-cli":"5.1.4","webpack-dev-server":"5.1.0"}} No newline at end of file
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify whether DANGEROUSLY_DISABLE_HOST_CHECK is actually consumed by project config/code.
rg -n "DANGEROUSLY_DISABLE_HOST_CHECK|allowedHosts|disableHostCheck" src/frontend_workspaces/frontend/package.json src/frontend_workspaces/frontend/webpack.config.js

Repository: cuga-project/cuga-agent

Length of output: 2725


DANGEROUSLY_DISABLE_HOST_CHECK environment variable is set but not consumed by webpack config.

Multiple npm scripts set this env var, but the webpack config hardcodes allowedHosts: "all" at line 147 without any logic to check the variable. This creates misleading intent—developers may assume the env var controls host validation, when in fact it has no effect. Either remove the env var from scripts or conditionally wire allowedHosts to respect it.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/frontend_workspaces/frontend/package.json` at line 1, The scripts in
package.json set DANGEROUSLY_DISABLE_HOST_CHECK but webpack.config.js ignores it
because allowedHosts is hardcoded to "all"; either remove the env var from the
scripts (e.g., the start/dev/dev:watch variants) or update webpack.config.js to
read process.env.DANGEROUSLY_DISABLE_HOST_CHECK and conditionally set
allowedHosts (e.g., if truthy set allowedHosts: "all", else leave
default/validate hosts). Locate the scripts section in package.json (script
names: start, start:win, dev, dev:real, dev:win, dev:watch, dev:watch:win) and
the allowedHosts setting in webpack.config.js (symbol: allowedHosts) and
implement one of the two fixes so the env var's presence actually controls host
validation.

Comment on lines +10 to +14
if (code.includes("var pathRegexp = require('path-to-regexp');")) {
code = code.replace(
"var pathRegexp = require('path-to-regexp');",
"var {pathToRegexp: pathRegexp} = require('path-to-regexp');"
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

🧩 Analysis chain

🌐 Web query:

For Express 4.x lib/router/layer.js, what call signature and behavior does pathRegexprely on, and is it compatible withpath-to-regexp@8.x pathToRegexp?

💡 Result:

In Express 4.x lib/router/layer.js, pathRegexp (imported from 'path-to-regexp', specifically ~0.1.x versions like 0.1.7-0.1.12) is called with the signature pathRegexp(path, this.keys = [], opts) where path is the route path string, this.keys is an array that gets populated with parameter keys, and opts is an options object (e.g., {sensitive: false, strict: false, end: true}). It returns a RegExp object directly, which is assigned to this.regexp. The Layer constructor then sets fast path flags on this RegExp like this.regexp.fast_star = path === '*' and this.regexp.fast_slash = path === '/' && opts.end === false. The match() method uses this.regexp.exec(path) to test matches and populate params from this.keys and the exec result.[1][2][3][4][5] This is not compatible with path-to-regexp@8.x pathToRegexp, which has a breaking change in return value: it returns an object { regexp, keys } instead of a bare RegExp.[6][7][8] Assigning the result to this.regexp would make this.regexp an object (not RegExp), causing this.regexp.exec() to fail in match(), and this.keys would not be populated as expected since the keys array is handled differently (not mutated in place the same way, and syntax changes like requiring named wildcards). Express 4.x explicitly depends on path-to-regexp ~0.1.x for compatibility.[9][10][3]

Citations:


Import-shape rewrite alone is insufficient for Express 4 compatibility with path-to-regexp v8.

The patch only changes the import binding, but Express 4's layer.js calls pathRegexp(path, keys, opts) expecting the old API contract: a bare RegExp return value with in-place array mutation. path-to-regexp@8 returns {regexp, keys} instead, causing this.regexp to become an object rather than a RegExp, which breaks this.regexp.exec() in the match() method and prevents proper key population.

Applies to lines 10–14 and 27–31.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/frontend_workspaces/frontend/scripts/postinstall.cjs` around lines 10 -
14, The current rewrite only renames the import to pathRegexp but doesn't
preserve the old API contract used by Express 4's layer.js: change the patch so
any call sites that do "this.regexp = pathRegexp(path, keys, opts)" are
rewritten to call the new shape and extract the RegExp and mutate keys
accordingly (assign this.regexp = result.regexp and push/merge result.keys into
the existing keys array) so match() continues to call this.regexp.exec() and
keys are populated; update the postinstall replacement logic that touches
pathRegexp, layer.js, and any occurrence of pathRegexp(path, keys, opts) to
perform this extraction and key-array mutation.

Comment on lines +18 to +21
console.log('Pattern not found in file');
}
} else {
console.log('File not found at:', layerFile);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Script should fail fast when no patch is applied.

When the file/pattern is missing, this script only logs and exits 0. That hides a broken compatibility fix until runtime. Set a non-zero exit code when patching was expected but did not occur.

Minimal fail-fast pattern
+let patched = false;
...
     fs.writeFileSync(layerFile, code);
+    patched = true;
...
-    console.log('Pattern not found in file');
+    console.error('Pattern not found in file');
...
-  if (fs.existsSync(altLayerFile)) {
+  if (fs.existsSync(altLayerFile)) {
...
       fs.writeFileSync(altLayerFile, code);
+      patched = true;
...
   }
 }
+
+if (!patched) {
+  process.exitCode = 1;
+}

Also applies to: 33-35

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/frontend_workspaces/frontend/scripts/postinstall.cjs` around lines 18 -
21, The postinstall script currently only logs when a patch target is missing;
change the branches that log "Pattern not found in file" and "File not found
at:" (and the similar messages around lines 33-35) to terminate with a non-zero
exit (e.g., call process.exit(1) or throw an Error) so the install fails fast
when layerFile or the expected pattern isn't found; ensure you update both
occurrences that check for the file and for the pattern so missing-patch
situations do not silently exit 0.

Comment on lines 155 to +159
console.log(`Connecting to CUGA backend at: ${baseUrl}/stream`);
console.log(`Thread ID: ${threadId}`);
console.log(`User message: ${userMessage}`);
console.log(`Use Draft: ${useDraft}`);


Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid logging raw user prompts in production paths.

At Line 157, the full userMessage is logged. This can expose sensitive prompt content in browser logs/telemetry; prefer logging metadata only.

Proposed fix
-    console.log(`User message: ${userMessage}`);
+    console.log(`User message length: ${userMessage.length}`);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
console.log(`Connecting to CUGA backend at: ${baseUrl}/stream`);
console.log(`Thread ID: ${threadId}`);
console.log(`User message: ${userMessage}`);
console.log(`Use Draft: ${useDraft}`);
console.log(`Connecting to CUGA backend at: ${baseUrl}/stream`);
console.log(`Thread ID: ${threadId}`);
console.log(`User message length: ${userMessage.length}`);
console.log(`Use Draft: ${useDraft}`);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/frontend_workspaces/frontend/src/carbon-chat/customSendMessage.ts` around
lines 155 - 159, The code currently logs the raw user prompt via
console.log(`User message: ${userMessage}`) which can leak sensitive data;
update customSendMessage.ts to stop emitting the raw userMessage in production
by either removing that console.log or replacing it with safe metadata (e.g.,
log only threadId, useDraft, baseUrl, and a redacted/truncated preview or a
deterministic hash/length of userMessage). Ensure the change touches the logging
lines that reference userMessage (and keep the other logs for
threadId/useDraft/baseUrl) and gate any full-prompt output behind an explicit
dev-only check (NODE_ENV === "development") if you still need it while
debugging.

partial_item: {
response_type: MessageResponseTypes.TEXT,
text: "",
text: " ",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Whitespace stream chunks can leave empty assistant bubbles on early-return flows.

Using " " as partial text can accumulate visible blank content when the function returns before complete_item (e.g., SuggestHumanActions, tool-approval path, Error). Consider finalizing the stream before those returns, or avoid whitespace placeholders on reasoning-only updates.

Also applies to: 261-261, 282-282, 543-543

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/frontend_workspaces/frontend/src/carbon-chat/customSendMessage.ts` at
line 229, The handler is emitting a single-space chunk (" ") which causes empty
assistant bubbles on early-return paths; instead either finalize the stream by
calling complete_item() before any early return (e.g., in SuggestHumanActions,
tool-approval flows, and Error paths) or stop emitting whitespace placeholders
altogether for reasoning-only updates—locate the emit/send call that sets text:
" " (also present at the other occurrences) and change the logic to skip sending
a whitespace chunk or call complete_item() first so no blank assistant message
is left behind.

Comment on lines +690 to +706
useEffect(() => {
(async () => {
try {
const res = await api.getSkills();
if (!res.ok) {
console.error(`Failed to load skills (${res.status} ${res.statusText})`);
} else {
const data = await res.json();
setSkills(data.skills || []);
}
} catch (err) {
console.error("Failed to load skills:", err);
} finally {
setSkillsLoading(false);
}
})();
}, []);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Differentiate skills load failures from a true empty list.

At Line 694, a non-OK response is only logged, then Line 703 clears loading and Lines 1342-1354 render “No skills installed,” which is misleading on backend/network failure. Add an explicit error state (or toast) and render a failure message instead of the empty-state copy.

Suggested fix
+ const [skillsLoadError, setSkillsLoadError] = useState<string | null>(null);

  useEffect(() => {
    (async () => {
      try {
        const res = await api.getSkills();
        if (!res.ok) {
-         console.error(`Failed to load skills (${res.status} ${res.statusText})`);
+         setSkillsLoadError(`Failed to load skills (${res.status} ${res.statusText})`);
        } else {
          const data = await res.json();
          setSkills(data.skills || []);
+         setSkillsLoadError(null);
        }
      } catch (err) {
-       console.error("Failed to load skills:", err);
+       setSkillsLoadError("Failed to load skills");
      } finally {
        setSkillsLoading(false);
      }
    })();
  }, []);

- ) : skills.length === 0 ? (
+ ) : skillsLoadError ? (
+   <div style={{ padding: "1rem", color: "var(--cds-text-error)" }}>{skillsLoadError}</div>
+ ) : skills.length === 0 ? (

Also applies to: 1342-1354

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/frontend_workspaces/frontend/src/ChatLanding.tsx` around lines 690 - 706,
The skills fetch treats any non-OK/network error the same as an empty list; add
an explicit error state (e.g., skillsError via useState) and set it to an Error
or message inside the non-OK branch and catch block of the existing useEffect
that calls api.getSkills, while still clearing setSkillsLoading in finally;
update the component render logic that currently shows the empty-state ("No
skills installed" at the block around lines 1342-1354) to check skillsError
first and render a failure message or toast when skillsError is set (otherwise
render the empty list message when skills is an empty array), and keep existing
setSkills calls unchanged for successful responses.

Comment on lines +1357 to +1368
{skills.map((skill) => {
const isExpanded = expandedSkills.has(skill.name);
const descLength = skill.description?.length || 0;
const shouldTruncate = descLength > 100;
const displayDesc = isExpanded || !shouldTruncate
? skill.description
: truncateText(skill.description || "", 100);

return (
<Tile
key={skill.name}
style={{ padding: 0, overflow: "hidden", borderRadius: "4px" }}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

wc -l src/frontend_workspaces/frontend/src/ChatLanding.tsx

Repository: cuga-project/cuga-agent

Length of output: 125


🏁 Script executed:

sed -n '1357,1380p' src/frontend_workspaces/frontend/src/ChatLanding.tsx

Repository: cuga-project/cuga-agent

Length of output: 1216


🏁 Script executed:

sed -n '1430,1450p' src/frontend_workspaces/frontend/src/ChatLanding.tsx

Repository: cuga-project/cuga-agent

Length of output: 1066


🏁 Script executed:

# Search for skill type definition or interface
rg -n "interface.*[Ss]kill|type.*[Ss]kill" src/frontend_workspaces/frontend/src/ChatLanding.tsx -A 5

Repository: cuga-project/cuga-agent

Length of output: 288


🏁 Script executed:

sed -n '1357,1450p' src/frontend_workspaces/frontend/src/ChatLanding.tsx | grep -n "skill\\.name" -C 2

Repository: cuga-project/cuga-agent

Length of output: 1174


🏁 Script executed:

# Look for all occurrences of skill.name within the skills.map block
sed -n '1357,1450p' src/frontend_workspaces/frontend/src/ChatLanding.tsx

Repository: cuga-project/cuga-agent

Length of output: 4674


Use a stable unique key for skills cards and expansion state.

At Line 1367 and Lines 1438-1440, skill.name is used as identity. If two skills share the same name, React key collisions and coupled expand/collapse behavior occur. Use a unique key (e.g., ${skill.source}:${skill.name}).

Suggested fix
- {skills.map((skill) => {
-   const isExpanded = expandedSkills.has(skill.name);
+ {skills.map((skill) => {
+   const skillKey = `${skill.source}:${skill.name}`;
+   const isExpanded = expandedSkills.has(skillKey);

    return (
      <Tile
-       key={skill.name}
+       key={skillKey}
        style={{ padding: 0, overflow: "hidden", borderRadius: "4px" }}
      >
...
-       next.delete(skill.name);
+       next.delete(skillKey);
...
-       next.add(skill.name);
+       next.add(skillKey);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
{skills.map((skill) => {
const isExpanded = expandedSkills.has(skill.name);
const descLength = skill.description?.length || 0;
const shouldTruncate = descLength > 100;
const displayDesc = isExpanded || !shouldTruncate
? skill.description
: truncateText(skill.description || "", 100);
return (
<Tile
key={skill.name}
style={{ padding: 0, overflow: "hidden", borderRadius: "4px" }}
{skills.map((skill) => {
const skillKey = `${skill.source}:${skill.name}`;
const isExpanded = expandedSkills.has(skillKey);
const descLength = skill.description?.length || 0;
const shouldTruncate = descLength > 100;
const displayDesc = isExpanded || !shouldTruncate
? skill.description
: truncateText(skill.description || "", 100);
return (
<Tile
key={skillKey}
style={{ padding: 0, overflow: "hidden", borderRadius: "4px" }}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/frontend_workspaces/frontend/src/ChatLanding.tsx` around lines 1357 -
1368, The skill cards use skill.name as the React key and as the identifier for
expansion state, causing collisions when names repeat; change the identity to a
stable unique composite string (for example `${skill.source}:${skill.name}`)
wherever skill.name is used as the key in the skills.map rendering (the Tile
key) and wherever expandedSkills is consulted/updated (e.g., expandedSkills.has,
toggle logic and any set operations) so each skill is uniquely identified by
source+name instead of name alone.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/common/code_wrapper.py (1)

99-103: 💤 Low value

Redundant runtime conditional in generated code.

The if {repr(workspace_root)} else None guard is always true at runtime because the chdir_line string is only set when workspace_root is truthy (line 100/161). The conditional evaluates a non-empty string path which is always truthy.

♻️ Simplify the generated chdir line
         chdir_line = ""
         if workspace_root:
-            chdir_line = (
-                f"    _internal_os.chdir({repr(workspace_root)}) if {repr(workspace_root)} else None\n"
-            )
+            chdir_line = f"    _internal_os.chdir({workspace_root!r})\n"

Apply to both occurrences (lines 101-103 and 162-164). This also addresses the RUF010 linter hint by using !r conversion flag instead of repr().

Also applies to: 160-164

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/common/code_wrapper.py`
around lines 99 - 103, The generated chdir_line currently embeds a redundant
runtime conditional and uses repr(); change the generation so when
workspace_root is truthy you emit a simple unconditional chdir call like
"_internal_os.chdir({workspace_root!r})\n" (no "if ... else None"), and update
both occurrences that set chdir_line (the two places assigning chdir_line in
code_wrapper.py) to use the !r conversion flag instead of repr() so the
generated code is simplified and the RUF010 lint warning is resolved.
src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/code_executor.py (1)

176-179: 💤 Low value

Document the purpose of unconditional _internal_re injection.

The _internal_re module is injected unconditionally into the execution context for user code, whereas _internal_os is conditional. The comment states it's "useful for tools and not dangerous," but doesn't specify which tools or use cases require it. Clarify in comments or documentation whether this is necessary for specific user-facing functionality or if it represents a general best practice for dynamic code execution.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/code_executor.py`
around lines 176 - 179, Add a clear explanatory comment above the unconditional
injection of _internal_re in code_executor.py: state which user-facing tools or
execution paths rely on this (e.g., regex-based parsing, pattern matching,
validation, tokenization utilities), why it's considered safe to always expose
(pure library, no I/O), and note the contrast with conditional _internal_os
injection so reviewers understand the security rationale; reference the exact
symbol _internal_re and the related _internal_os in the comment so future
maintainers know why one is unconditional and the other is gated.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/code_executor.py`:
- Around line 176-179: Add a clear explanatory comment above the unconditional
injection of _internal_re in code_executor.py: state which user-facing tools or
execution paths rely on this (e.g., regex-based parsing, pattern matching,
validation, tokenization utilities), why it's considered safe to always expose
(pure library, no I/O), and note the contrast with conditional _internal_os
injection so reviewers understand the security rationale; reference the exact
symbol _internal_re and the related _internal_os in the comment so future
maintainers know why one is unconditional and the other is gated.

In
`@src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/common/code_wrapper.py`:
- Around line 99-103: The generated chdir_line currently embeds a redundant
runtime conditional and uses repr(); change the generation so when
workspace_root is truthy you emit a simple unconditional chdir call like
"_internal_os.chdir({workspace_root!r})\n" (no "if ... else None"), and update
both occurrences that set chdir_line (the two places assigning chdir_line in
code_wrapper.py) to use the !r conversion flag instead of repr() so the
generated code is simplified and the RUF010 lint warning is resolved.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e40648e7-fcd8-49cb-ba0a-17c1dc157f21

📥 Commits

Reviewing files that changed from the base of the PR and between 296d0ea and 6f4fbe2.

⛔ Files ignored due to path filters (6)
  • src/cuga/frontend/dist/index.html is excluded by !**/dist/**
  • src/cuga/frontend/dist/main.76f3bae552155a4125f1.js is excluded by !**/dist/**
  • src/cuga/frontend/dist/main.f58b9c6b13e6284dbd97.js is excluded by !**/dist/**
  • src/cuga/frontend/dist/vendors.7268c94d86f8c5d3e2ed.js is excluded by !**/dist/**
  • src/cuga/frontend/dist/vendors.b6b2c62d8ecc895d3c3e.js is excluded by !**/dist/**
  • src/frontend_workspaces/pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (14)
  • .gitignore
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/code_executor.py
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/common/code_wrapper.py
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/prompts/mcp_prompt.jinja2
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/reflection/prompts/reflection_system.jinja2
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/reflection/prompts/reflection_user.jinja2
  • src/frontend_workspaces/frontend/package.json
  • src/frontend_workspaces/frontend/scripts/postinstall.cjs
  • src/frontend_workspaces/frontend/src/ChatLanding.css
  • src/frontend_workspaces/frontend/src/ChatLanding.tsx
  • src/frontend_workspaces/frontend/src/api.ts
  • src/frontend_workspaces/frontend/src/carbon-chat/customSendMessage.ts
  • src/frontend_workspaces/frontend/webpack.config.js
✅ Files skipped from review due to trivial changes (2)
  • .gitignore
  • src/frontend_workspaces/frontend/src/api.ts
🚧 Files skipped from review as they are similar to previous changes (9)
  • src/frontend_workspaces/frontend/webpack.config.js
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/reflection/prompts/reflection_user.jinja2
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/reflection/prompts/reflection_system.jinja2
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/prompts/mcp_prompt.jinja2
  • src/frontend_workspaces/frontend/scripts/postinstall.cjs
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py
  • src/frontend_workspaces/frontend/src/ChatLanding.css
  • src/frontend_workspaces/frontend/src/carbon-chat/customSendMessage.ts
  • src/frontend_workspaces/frontend/src/ChatLanding.tsx

The sandbox-aware chdir introduced in the prior commit assumed the
per-thread workspace directory already existed on disk. LocalExecutor
(unlike LocalSandboxExecutor, which mkdirs before shell flows) never
creates it, and unit tests typically pass thread_id=None ->
/tmp/cuga/_default/workspace, which does not exist -> os.chdir raised
FileNotFoundError before any user code ran, leaving _locals empty and
cascading into dozens of "new_vars == {}" assertion failures.

Mirror the LocalSandboxExecutor pattern: mkdir(parents=True,
exist_ok=True) the resolved workspace path before passing it into
CodeWrapper. The mkdir lives inside the existing try/except so any
permission or I/O error still falls back to "no chdir" cleanly
(workspace_root_for_wrap = None) and execution proceeds with the
process CWD as it did before this branch.

Smoke-verified locally with thread_id=None: workspace path is created
on first call and chdir succeeds.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/code_executor.py`:
- Around line 177-183: The code injects os into the user execution locals via
_internal_os (when workspace_root_for_wrap is truthy), exposing dangerous APIs
to user code; change this so os is only available to the wrapper plumbing and
never present in the user-visible execution namespace: move the chdir/workspace
logic to run inside the wrapper scope (use a separate dict for wrapper-only exec
or call wrapper functions with a local-only namespace), or ensure you remove
_internal_os from context_locals/_locals before executing user code (and
likewise remove any similar temporary symbols created around the new_vars.pop
usage); update the same pattern referenced around the lines that set
_internal_os (and the similar 222-223 spots) so validate_imports() still governs
all imports accessible to user code.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 856366fb-13c6-4d53-b0b0-2466c222185e

📥 Commits

Reviewing files that changed from the base of the PR and between 6f4fbe2 and a2d1444.

📒 Files selected for processing (1)
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/code_executor.py

Comment thread src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/code_executor.py Outdated
)
reflection_agent = reflection_task(llm=active_model)
# Calculate sandbox_workspace using same logic as prepare_tools_and_apps
_sandbox_mode = getattr(settings.advanced_features, "sandbox_mode", "opensandbox")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MED-1] Sandbox workspace logic duplicated in three places — divergence risk
src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py:2222

Explanation

The expression "." if _sandbox_mode in ("native", "local") else "/workspace" now appears in at least three places:

  1. prepare_tools_and_apps (line ~1695) — for the MCP prompt
  2. sandbox() (line ~2222) — for the reflection prompt (this PR)
  3. code_executor.py (line ~151) — for the chdir logic (this PR)

Each site independently reads settings.advanced_features.sandbox_mode with getattr(..., "opensandbox"). If a new sandbox mode is added (e.g., "docker"), all three sites must be updated in sync. The comment in cuga_lite_graph.py even says "using same logic as prepare_tools_and_apps", which is a code smell.

Suggested Fix

Extract a shared helper, e.g. in a utility module:

# e.g. in executors/common/sandbox_utils.py
def get_sandbox_workspace(sandbox_mode: str) -> str:
    return "." if sandbox_mode in ("native", "local") else "/workspace"

def current_sandbox_workspace() -> str:
    mode = getattr(settings.advanced_features, "sandbox_mode", "opensandbox")
    return get_sandbox_workspace(mode)

Then all three call sites use current_sandbox_workspace().

Agent Prompt

In src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py around line 2222, and also in code_executor.py around line 151, the expression "." if _sandbox_mode in ("native", "local") else "/workspace" is duplicated. A third copy exists in prepare_tools_and_apps around line 1695. Extract this into a shared helper function to avoid divergence when new sandbox modes are added.

@@ -0,0 +1,36 @@
#!/usr/bin/env node
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MED-2] postinstall.cjs patches a vendored file in node_modules — fragile and non-idiomatic
src/frontend_workspaces/frontend/scripts/postinstall.cjs:1

Explanation

The postinstall script directly mutates express/lib/router/layer.js inside node_modules to fix a path-to-regexp v8 import shape incompatibility. This approach is fragile:

  1. The patch is silently skipped if the file path doesn't match (the script just logs "Pattern not found" or "File not found" and exits 0).
  2. pnpm may re-hoist or re-link the file on any pnpm install, silently reverting the patch.
  3. The alternate path (../../node_modules/.pnpm/express@4.22.2/...) is hardcoded to a specific express version — it will silently fail if express is upgraded.
  4. There is no verification that the patch was actually applied.

The standard approach for this kind of fix is to use patch-package (or pnpm's built-in pnpm patch) which integrates with the lockfile and is re-applied reliably.

Suggested Fix

Use pnpm patch express to generate a proper patch file that pnpm applies automatically and tracks in the lockfile. This is the idiomatic pnpm approach and doesn't require a custom postinstall script.

Alternatively, pin path-to-regexp to a v7.x version that still exports the old API shape, avoiding the need for the patch entirely.

Agent Prompt

In src/frontend_workspaces/frontend/scripts/postinstall.cjs, the script patches express/lib/router/layer.js directly in node_modules. Review whether this is the right approach: the patch silently no-ops if the file path doesn't match, and pnpm may revert it on reinstall. Consider replacing this with pnpm patch (which generates a .patches/ file tracked in the lockfile) or pinning path-to-regexp to a compatible version.

Not sure about this one, it just proposes a different solution

# The directory is created here (mirroring LocalSandboxExecutor's behavior for shell flows)
# so LocalExecutor paths and unit tests with thread_id=None don't crash on os.chdir.
workspace_root_for_wrap = None
if mode == 'local':
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MED-3] _sandbox_mode == 'native' branch in code_executor.py is unreachable when mode == 'local'
src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/code_executor.py:153

Explanation

The outer condition is if mode == 'local', but inside it checks if _sandbox_mode == 'native' and uses native_thread_workspace_root. The mode parameter is set to 'local' when settings.advanced_features.e2b_sandbox is False (line ~128), regardless of sandbox_mode. When sandbox_mode == 'native', the NativeSandboxExecutor is used (not LocalExecutor), so mode would not be 'local' in that path. This means the native branch inside if mode == 'local' may never actually be reached in practice, and the local_thread_workspace_root fallback is always used. If the intent is to support native mode here, the outer condition needs to be broadened.

Suggested Fix

Clarify the intent. If native sandbox mode should also get the chdir treatment, the condition should be:

if mode in ('local',) or _sandbox_mode in ('native', 'local'):
    ...

Or document explicitly why native mode is handled inside the mode == 'local' branch (e.g., because native mode forces mode = 'local' upstream).

Agent Prompt

In src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/code_executor.py around line 149, the workspace chdir logic is gated on mode == 'local', but inside it branches on _sandbox_mode == 'native' to use native_thread_workspace_root. Verify whether mode can actually be 'local' when sandbox_mode is 'native' — trace how mode is set upstream. If the native branch is unreachable, either remove it or fix the outer condition.

Not sure i understand the difference between mode and sandbox_node

…-paths-only at LLM

Unifies three previously-divergent ideas about where the agent's workspace
lives: the demo CLI/MCP server's `./cuga_workspace`, the skills feature's
`/tmp/cuga/<thread>/workspace`, and the host-process `chdir` added in the
previous PR commit (which broke CI by moving Python's CWD outside the
filesystem MCP server's allowed root).

Conceptual model
- The agent runs in a sandbox whose CWD IS the workspace; the LLM is told
  to use relative paths only. The runtime owns the host-side mapping per
  mode and per thread.
- `settings.skills.enabled` gates layout:
    skills off -> shared  <cwd>/cuga_workspace/         (demo CLI parity)
    skills on  -> per-thread <cwd>/cuga_workspace/<safe_thread_id>/
- The filesystem MCP server is launched ONCE with allowed-dir AND CWD set
  to `<cwd>/cuga_workspace/`, so it serves every thread without restart.

Backend
- local_/native_thread_workspace_root: skills-conditional, rooted at
  <cwd>/cuga_workspace; drop the tempfile.gettempdir() / /private/tmp
  workspace path. Native seatbelt policy now allows writes under
  <cwd>/cuga_workspace as well as the existing /private/tmp (venv/caches).
- seed_thread_workspace_if_needed: when a per-thread workspace is freshly
  created, copy every top-level file from the shared parent into it so
  CI/demo fixtures (e.g. contacts.txt) are reachable without workflow
  changes. Idempotent via a `.cuga_seeded` sentinel.
- code_executor: delete the workspace-resolve + mkdir + host-process
  `os.chdir` block (and the `_internal_os` injection it required).
  Workspace CWD alignment is now handled by each entry point at launch
  time, not by mutating Python's CWD.
- code_wrapper: drop `workspace_root` param + `chdir_line` plumbing; goes
  back to pre-PR shape. `os` is intentionally NOT injected into the user
  namespace, so SecurityValidator.validate_imports() remains the sole gate.

MCP filesystem
- managed_mcp.py BOOTSTRAP_YAML now spawns the filesystem MCP subprocess
  with `cwd: ./cuga_workspace`, and changes its allowed-dir arg from
  `./cuga_workspace` to `.` (resolves against the new CWD). Persisted YAMLs
  from older installs are migrated lazily on startup.
- mcp_manager: ServiceConfig gains an optional `cwd:` field; the STDIO
  transport now forwards it to `StdioTransport(cwd=...)` and mkdirs the
  target so the subprocess doesn't fail to start.
- NEW mcp_filesystem_wrapper.py: when skills are enabled, wraps every
  `filesystem_*` MCP tool so relative path args get prefixed with
  `<safe_thread_id>/` before forwarding (covers both StructuredTool path
  via chat_agent._build_runtime_context and the raw-function path via
  cuga_lite_graph's tools_context_dict). No-op when skills are off.

Prompts / CLI
- mcp_prompt.jinja2 + reflection_system.jinja2: collapse the per-mode
  `{% if sandbox_workspace == ... %}` branches into a single rule -
  "use relative paths only; your CWD is the workspace". reflection_user
  no longer surfaces a workspace path at all.
- demo CLI (cli.py): CUGA_POLICIES_CONTENT rewritten from
  "write only from <absolute path>" to "use relative paths only". The
  demo filesystem MCP subprocess is now spawned with `cwd=workspace`.

Hardening
- NativeSandboxExecutor._ensure_venv: previously trusted `Path('/tmp/.venv')
  .exists()` and unconditionally flipped `_venv_ready=True` after creation
  attempts - a stale empty dir or a silently-failed venv left every later
  `run_command` reporting `/tmp/.venv/bin/activate: No such file or
  directory`. Now validates `bin/activate.is_file()`, wipes stale dirs
  before recreating, and only marks ready after verifying the file the
  sandbox prelude actually sources.

Tests
- NEW src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/tests/
  test_local_sandbox_skills.py - 13 tests covering: per-thread workspace
  placement, two-thread isolation, seed-on-first-use (top-level files
  only, skips dirs/dotfiles), shared workspace when skills off, sandbox
  write/read/list_files round-trip, path-escape rejection, and the MCP
  wrapper layer (relative prefix, absolute pass-through, skills-off
  no-op, only filesystem_* tools wrapped, BaseTool wrapper preserves
  name/description).
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/local/local_sandbox_executor.py`:
- Around line 94-99: The shared-workspace detection currently compares
thread_root to its parent (thread_root.resolve() == parent.resolve()), which is
always false and causes seeding to run incorrectly; update the condition to
detect when the thread directory is actually the top-level cuga workspace by
comparing the parent to the workspace root (e.g., replace the current check with
parent.resolve() == self.cuga_workspace.resolve() or equivalent workspace-root
comparison) so the function (in local_sandbox_executor.py around the thread_root
handling code) returns early when the workspace is shared; apply the same fix to
the other occurrences noted (the blocks at the other two similar locations).

In
`@src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/native/native_sandbox_executor.py`:
- Around line 86-91: The guard incorrectly compares thread_root to its parent
and therefore misses shared-workspace mode; update the conditional in
native_sandbox_executor (the block using thread_root and parent) to also detect
and skip shared-workspace seeding — e.g., return early if thread_root.resolve()
== parent.resolve() OR if parent.resolve() == workspace_root.resolve() (or if a
boolean flag like self.shared_workspace_mode / config.shared_workspace is set)
so shared-workspace mode is skipped and the unintended copy behavior is avoided.

In `@src/cuga/backend/tools_env/registry/config/config_loader.py`:
- Line 55: The new cwd field on the config dataclass is never populated when
building ServiceConfig, causing subprocesses to run with None CWD; update the
_create_service_config(...) routine to read the cwd value from the parsed config
(where cwd: Optional[str] is declared) and assign it to the ServiceConfig
instance being returned (ensure ServiceConfig.cwd is set), so that consumers
that spawn STDIO transport subprocesses use the configured working directory.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 48266c10-2e4d-4c58-a99a-4b447b14c637

📥 Commits

Reviewing files that changed from the base of the PR and between a2d1444 and 52b0c19.

📒 Files selected for processing (14)
  • docs/examples/demo_apps/setup/cli.py
  • src/cuga/backend/cuga_graph/nodes/chat/chat_agent/chat_agent.py
  • src/cuga/backend/cuga_graph/nodes/chat/chat_agent/mcp_filesystem_wrapper.py
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/code_executor.py
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/common/code_wrapper.py
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/local/local_sandbox_executor.py
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/native/native_sandbox_executor.py
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/tests/test_local_sandbox_skills.py
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/prompts/mcp_prompt.jinja2
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/reflection/prompts/reflection_system.jinja2
  • src/cuga/backend/server/managed_mcp.py
  • src/cuga/backend/tools_env/registry/config/config_loader.py
  • src/cuga/backend/tools_env/registry/mcp_manager/mcp_manager.py
✅ Files skipped from review due to trivial changes (1)
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/common/code_wrapper.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/reflection/prompts/reflection_system.jinja2

command: Optional[str] = None
args: Optional[List[str]] = None
env: Optional[Dict[str, str]] = None # Environment variables for STDIO transport
cwd: Optional[str] = None # Working directory for STDIO transport subprocess
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

cwd is declared but never loaded into ServiceConfig (feature break + scope risk).

Line 55 adds cwd, but _create_service_config(...) never sets it when building ServiceConfig (Lines 243-259). That means config.cwd stays None, so stdio MCP processes won’t honor workspace CWD.

Suggested fix
     service_config = ServiceConfig(
         name=service_name,
         description=config.get('description'),
         url=config.get('url'),
         command=config.get('command'),
         args=config.get('args'),
         env=env_config,
+        cwd=config.get('cwd'),
         transport=config.get('transport'),
         auth=auth,
         include=config.get('include'),
         type=service_type,
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cuga/backend/tools_env/registry/config/config_loader.py` at line 55, The
new cwd field on the config dataclass is never populated when building
ServiceConfig, causing subprocesses to run with None CWD; update the
_create_service_config(...) routine to read the cwd value from the parsed config
(where cwd: Optional[str] is declared) and assign it to the ServiceConfig
instance being returned (ensure ServiceConfig.cwd is set), so that consumers
that spawn STDIO transport subprocesses use the configured working directory.

…clamp policy confidence

Three follow-up cleanups to the workspace-migration commit (52b0c19), all
discovered while running run_tests.sh.

Drop seed-on-first-use
- Stability tests run with `settings.skills.enabled = false` (default), in
  which case `local_thread_workspace_root(...)` returns the shared parent
  `./cuga_workspace/` and the seed helper was already a no-op.
- The only path where the seed did anything is skills-on at runtime, and
  there it propagated every parent-level file into every new thread on
  first shell/MCP call — fixtures, scratch notes, anything sitting at
  the workspace root. That was magic and surprising, not desirable.
- Removed the `seed_thread_workspace_if_needed` helper from both
  local_sandbox_executor.py and native_sandbox_executor.py, removed all
  four call sites (in `_run_command` / `_run_sandboxed` / two
  `create_list_files_tool` paths), and trimmed the chat-agent wrapper's
  `_ensure_thread_workspace_seeded` to `_ensure_thread_workspace_exists`
  (mkdir only — MCP filesystem writes still need a real directory to
  land in). Mkdir at each call site is retained; only the copy step is gone.
- Updated test_local_sandbox_skills.py: dropped seed-fixture assertions
  from the two integration tests that asserted them, replaced them with
  "parent files MUST NOT bleed into per-thread dir" assertions (the
  inverse), and removed test_local_sandbox_seed_skips_other_thread_dirs
  entirely (it was the seed-specific test).

Gate `_internal_re` injection behind skills
- code_executor.py always injected `re` as `_internal_re` into the user
  exec namespace. That pushed `context_locals` past the
  `if not context_locals: return` short-circuit in
  `SecurityValidator.validate_context_usage`, which then rejected every
  unit test that runs bare user code like `y = 20` with the error
  "Code must use at least one variable or tool from context".
- 12 tests in `test_code_executor.py`, `test_e2b_lite.py`,
  `test_variable_creation_order.py`, and `test_tool_call_timeout.py`
  were failing as a result. The injection itself is only useful for skill
  flows that lean on regex helpers; scope it to `_skills_enabled()` so
  plain user-code paths keep `context_locals` empty and the short-circuit
  fires.
- Patched test_e2b_lite.py::test_eval_with_e2b_disabled to set
  `mock_settings.skills.enabled = False` explicitly. Bare MagicMock()
  returns a truthy proxy for any attribute, so the test had been
  unintentionally exercising the skills-on path.

Clamp policy confidence to [0, 1]
- When both a keyword trigger AND a natural-language trigger fire on the
  same IntentGuard, `match_policy` combined the two confidence scores
  and occasionally produced `1.0000000000000002` (float rounding).
  `PolicyMatch.confidence` has `Field(le=1.0)`, so Pydantic raised a
  validation error, the surrounding `except` swallowed it, and the
  policy match silently disappeared (matched=False, empty metadata).
- test_e2e_intent_guard_blocks_in_cuga_lite was the visible casualty
  (`assert metadata.get("policy_blocked")` against `{}`). Defensive
  clamp at the PolicyMatch construction site in policy/agent.py keeps
  the fix one line and doesn't change semantics for any other call site.

Verification
- `uv run pytest src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/tests src/cuga/backend/cuga_graph/policy/tests` -> 166 passed, 0 failed, 2m04s.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/native/native_sandbox_executor.py (1)

123-139: ⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy lift

Scope the Seatbelt allowlist to the resolved thread workspace, not the shared parent.

In skills-enabled mode, _run_sandboxed() cds into <cwd>/cuga_workspace/<thread>, but _build_policy() grants file-write* to the whole <cwd>/cuga_workspace tree and _ensure_policy() caches that policy once. A command like echo x > ../thread-B/pwned.txt can therefore mutate a sibling workspace, and any later host os.chdir() makes the cached whitelist diverge from native_thread_workspace_root(). Build/cache policies by resolved workspace_root instead.

Also applies to: 160-163

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/native/native_sandbox_executor.py`
around lines 123 - 139, The Seatbelt policy currently grants file-write* to the
whole CUGA_WORKSPACE_DIRNAME parent via _build_policy(), which allows
cross-thread writes when _run_sandboxed() cd's into a per-thread workspace;
change _build_policy() to accept or compute the resolved per-thread
workspace_root (use native_thread_workspace_root() or the same resolved Path
used by _run_sandboxed()) and emit the policy with (subpath "{workspace_root}")
instead of the shared parent, and update _ensure_policy() to cache policies
keyed by that resolved workspace_root so each thread gets its own scoped
allowlist.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/cuga/backend/cuga_graph/nodes/chat/chat_agent/mcp_filesystem_wrapper.py`:
- Around line 98-104: The helper _skills_enabled currently catches all
exceptions and silently returns False; change it to only swallow expected
lookup/import failures (catch ImportError and AttributeError) and return False
for those, while logging and re-raising any other unexpected exceptions so
failures don't silently disable isolation. Update the function to import or use
a module logger (e.g., logging.getLogger(__name__) or the existing logger if
present), log the unexpected exception with context ("_skills_enabled failed")
and then re-raise it instead of returning False; keep the existing
getattr(settings, "skills", None).enabled lookup and the boolean return behavior
for the normal path.
- Around line 51-58: The _rewrite_path function currently allows paths like
"../thread-B/notes.txt" which become "thread-A/../thread-B/notes.txt" and escape
the intended thread folder; fix _rewrite_path (used with _is_absolute) to reject
any path that contains parent-directory segments before prefixing: after
stripping a leading "./" and handling the empty/"." case, split the remaining
path by "/" and if any segment equals ".." raise an error (e.g., ValueError) or
otherwise refuse the input, so the prefixing logic never produces paths that can
traverse up into sibling thread directories.
- Around line 88-94: The wrapped StructuredTool created in
StructuredTool.from_function loses BaseTool metadata (return_direct, tags,
callbacks, metadata) from the original MCP tool, causing behavioral changes;
update the wrapping in mcp_filesystem_wrapper by preserving those fields—pass
return_direct=getattr(tool, "return_direct", False) (and tags/callbacks/metadata
as appropriate) into StructuredTool.from_function or instead use the original
tool's copy/model_copy (e.g., BaseTool.copy() / model_copy()) and replace only
the invoke/ainvoke implementations (_invoke/_ainvoke) so
name/description/args_schema and all metadata stay intact.

In
`@src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/tests/test_local_sandbox_skills.py`:
- Around line 91-94: The flaky assertion that requires /tmp/cuga to be empty
should be replaced with a snapshot comparison: before running the code under
test (in test_local_sandbox_skills), record the set of paths under
Path("/tmp/cuga") (empty set if the directory doesn't exist), run the operation,
then capture the set of paths again and assert that the after-set is exactly the
before-set (or that no new paths exist by asserting
after_set.issubset(before_set) is False), i.e. verify the test did not add any
new files; update the assertion that currently references Path("/tmp") / "cuga"
to compare these two sets instead.

---

Outside diff comments:
In
`@src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/native/native_sandbox_executor.py`:
- Around line 123-139: The Seatbelt policy currently grants file-write* to the
whole CUGA_WORKSPACE_DIRNAME parent via _build_policy(), which allows
cross-thread writes when _run_sandboxed() cd's into a per-thread workspace;
change _build_policy() to accept or compute the resolved per-thread
workspace_root (use native_thread_workspace_root() or the same resolved Path
used by _run_sandboxed()) and emit the policy with (subpath "{workspace_root}")
instead of the shared parent, and update _ensure_policy() to cache policies
keyed by that resolved workspace_root so each thread gets its own scoped
allowlist.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6e167e18-7328-40e0-8315-6965ba7c00d4

📥 Commits

Reviewing files that changed from the base of the PR and between 52b0c19 and a95b173.

📒 Files selected for processing (7)
  • src/cuga/backend/cuga_graph/nodes/chat/chat_agent/mcp_filesystem_wrapper.py
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/code_executor.py
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/local/local_sandbox_executor.py
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/native/native_sandbox_executor.py
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/tests/test_e2b_lite.py
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/tests/test_local_sandbox_skills.py
  • src/cuga/backend/cuga_graph/policy/agent.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/code_executor.py

Comment thread src/cuga/backend/cuga_graph/nodes/chat/chat_agent/mcp_filesystem_wrapper.py Outdated
Comment on lines +88 to +94
wrapped = StructuredTool.from_function(
func=_invoke,
coroutine=_ainvoke,
name=tool.name,
description=tool.description or "",
args_schema=getattr(tool, "args_schema", None),
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the file in question
cat -n src/cuga/backend/cuga_graph/nodes/chat/chat_agent/mcp_filesystem_wrapper.py | head -120

Repository: cuga-project/cuga-agent

Length of output: 5269


🏁 Script executed:

# Check for requirements/dependencies to determine langchain-core version
fd -type f "requirements*" -o "setup.py" -o "pyproject.toml" | head -20

Repository: cuga-project/cuga-agent

Length of output: 238


🏁 Script executed:

# Also search for any langchain version specifications
rg "langchain" --type py | grep -i "import\|from" | head -20

Repository: cuga-project/cuga-agent

Length of output: 1930


🏁 Script executed:

# Look for dependency files
fd -name "pyproject.toml" -o -name "requirements.txt" -o -name "setup.py" -o -name "setup.cfg"

Repository: cuga-project/cuga-agent

Length of output: 298


🏁 Script executed:

# Check the pyproject.toml or requirements files for langchain version
cat pyproject.toml 2>/dev/null | grep -A 50 "dependencies\|langchain" || cat requirements.txt 2>/dev/null || true

Repository: cuga-project/cuga-agent

Length of output: 7464


🏁 Script executed:

# Let's check the langchain-core documentation or source to understand StructuredTool.from_function()
# First, let me search for how tools are created in the codebase
rg "from_function" --type py -B 3 -A 3

Repository: cuga-project/cuga-agent

Length of output: 35568


🏁 Script executed:

# Let's also check if there are other uses of BaseTool in the codebase to see what metadata matters
rg "return_direct\|tags\|callbacks" --type py | head -30

Repository: cuga-project/cuga-agent

Length of output: 49


🏁 Script executed:

# Let's see if there's any documentation or tests about tool wrapping
fd -name "*test*mcp*" -o -name "*mcp*test*" 2>/dev/null

Repository: cuga-project/cuga-agent

Length of output: 49


🏁 Script executed:

# Check for any existing tool wrapping patterns in the codebase
rg "BaseTool\|StructuredTool" --type py -B 2 -A 5 | head -80

Repository: cuga-project/cuga-agent

Length of output: 49


🏁 Script executed:

# Let's search for the source code or usage patterns to understand what metadata matters
rg "return_direct\|tags\|callbacks\|metadata" --type py -B 2 -A 2 | head -80

Repository: cuga-project/cuga-agent

Length of output: 49


🏁 Script executed:

# Let's check the test file to see if return_direct is actually being used
cat -n src/cuga/backend/cuga_graph/policy/tests/test_e2e_healthcare_family_claims.py | grep -A 10 -B 10 "return_direct"

Repository: cuga-project/cuga-agent

Length of output: 1739


🏁 Script executed:

# Let's look at how MCP tools are obtained and what type they are
rg "BaseTool\|MCP" src/cuga/backend/cuga_graph/nodes/chat/chat_agent/ --type py -B 2 -A 2

Repository: cuga-project/cuga-agent

Length of output: 49


🏁 Script executed:

# Check if langchain-core 1.2.22+ has return_direct parameter in from_function
# Let's look for test patterns with return_direct in from_function calls
rg "from_function.*return_direct" --type py

Repository: cuga-project/cuga-agent

Length of output: 49


🏁 Script executed:

# Let's check if there's a BaseTool.copy method or similar that preserves metadata
rg "\.copy\(\)\|copy_update" src/cuga/backend/cuga_graph/nodes/chat/chat_agent/mcp_filesystem_wrapper.py --type py

Repository: cuga-project/cuga-agent

Length of output: 49


🏁 Script executed:

# Let's see how the tool parameter is used - what type is it expected to be
cat -n src/cuga/backend/cuga_graph/nodes/chat/chat_agent/mcp_filesystem_wrapper.py | sed -n '77,130p'

Repository: cuga-project/cuga-agent

Length of output: 2331


🏁 Script executed:

# Let's check if BaseTool has any relevant attributes for callbacks, return_direct, tags
rg "tool\." src/cuga/backend/cuga_graph/nodes/chat/chat_agent/mcp_filesystem_wrapper.py --type py -B 1 -A 1

Repository: cuga-project/cuga-agent

Length of output: 550


🏁 Script executed:

# Let's check the broader context - how is _wrap_filesystem_tool called and what tools are passed to it
rg "_wrap_filesystem_tool\|wrap_filesystem_tool" src/cuga/backend/cuga_graph/nodes/chat/chat_agent/ --type py -B 3 -A 3

Repository: cuga-project/cuga-agent

Length of output: 49


🏁 Script executed:

# Search for calls to _wrap_filesystem_tool to see what tools are passed
rg "_wrap_filesystem_tool" --type py -B 5 -A 5

Repository: cuga-project/cuga-agent

Length of output: 2343


🏁 Script executed:

# Let's check the entire mcp_filesystem_wrapper file to understand the full context
wc -l src/cuga/backend/cuga_graph/nodes/chat/chat_agent/mcp_filesystem_wrapper.py

Repository: cuga-project/cuga-agent

Length of output: 147


🏁 Script executed:

# Print the full file
cat -n src/cuga/backend/cuga_graph/nodes/chat/chat_agent/mcp_filesystem_wrapper.py

Repository: cuga-project/cuga-agent

Length of output: 8777


🏁 Script executed:

# Let's check if there are any MCP tools being used and if they have return_direct, tags, callbacks set
rg "langchain.mcp\|langchain-mcp" --type py | head -20

Repository: cuga-project/cuga-agent

Length of output: 49


🏁 Script executed:

# Let's search for imports or usage of MCP adapters
rg "langchain_mcp_adapters" --type py -B 2 -A 2

Repository: cuga-project/cuga-agent

Length of output: 584


🏁 Script executed:

# Check BaseTool attributes to see what fields might be important
rg "return_direct\|\.tags\|\.callbacks\|\.metadata" --type py -B 2 -A 2 | head -100

Repository: cuga-project/cuga-agent

Length of output: 49


🏁 Script executed:

# Search for what StructuredTool.from_function actually accepts as parameters
rg "from_function" src/cuga/backend/cuga_graph/nodes/chat/chat_agent/mcp_filesystem_wrapper.py -B 10 -A 5

Repository: cuga-project/cuga-agent

Length of output: 690


🏁 Script executed:

# Let's check if BaseTool has a copy method that might be better for preserving metadata
python3 << 'EOF'
import inspect
try:
    from langchain_core.tools import BaseTool, StructuredTool
    # Check what parameters from_function accepts
    sig = inspect.signature(StructuredTool.from_function)
    print("StructuredTool.from_function signature:")
    print(sig)
    print("\nParameters:")
    for param_name, param in sig.parameters.items():
        print(f"  {param_name}: {param.annotation if param.annotation != inspect.Parameter.empty else 'no annotation'}")
    
    # Check if BaseTool has a copy method
    if hasattr(BaseTool, 'copy'):
        print("\nBaseTool has a 'copy' method")
        copy_sig = inspect.signature(BaseTool.copy)
        print(f"copy signature: {copy_sig}")
    
    # Check BaseTool attributes
    print("\nBaseTool common attributes:")
    for attr in ['return_direct', 'tags', 'callbacks', 'metadata']:
        if hasattr(BaseTool, attr):
            print(f"  {attr}: exists")
except Exception as e:
    print(f"Error: {e}")
    import traceback
    traceback.print_exc()
EOF

Repository: cuga-project/cuga-agent

Length of output: 252


🌐 Web query:

langchain-core 1.2.22 StructuredTool.from_function parameters documentation

💡 Result:

In langchain-core, StructuredTool.from_function accepts these parameters (per the official LangChain reference docs): - func: Callable | None (default: None) The function from which to create a tool. - coroutine: Callable[..., Awaitable[Any]] | None (default: None) The async function from which to create a tool. - name: str | None (default: None) The name of the tool (defaults to the function name). - description: str | None (default: None) The description of the tool (defaults to the function docstring). - return_direct: bool (default: False) Whether to return the result directly or as a callback. - args_schema: ArgsSchema | None (default: None) The schema of the tool’s input arguments. - infer_schema: bool (default: True) Whether to infer the schema from the function’s signature. - response_format: Literal['content', 'content_and_artifact'] (default: 'content') Controls how the tool output is interpreted: - 'content': output is interpreted as the contents of a ToolMessage - 'content_and_artifact': output is expected to be a two-tuple (content, artifact) for a ToolMessage. - parse_docstring: bool (default: False) If infer_schema and parse_docstring are set, attempt to parse parameter descriptions from Google Style function docstrings. - error_on_invalid_docstring: bool (default: False) If parse_docstring is provided, configure whether to raise ValueError on invalid Google Style docstrings. - kwargs: Any (default: {}) Additional arguments to pass through. These are documented in the StructuredTool.from_function reference page for langchain_core. [1] Top parameters you’ll most often care about: - func: provide your callable [1] - name / description: optional overrides [1] - args_schema / infer_schema: input schema behavior [1] - return_direct: output handling [1] - response_format: whether you return only content or (content, artifact) [1]

Citations:


🌐 Web query:

langchain BaseTool return_direct tags callbacks metadata fields

💡 Result:

In LangChain (langchain_core.tools.BaseTool), the relevant fields are: 1) return_direct Whether the tool’s output should be returned directly. When set to True, the AgentExecutor stops looping after the tool is called. [1] 2) tags BaseTool has an optional tags attribute (list[str] | None). These tags are associated with each tool call and are passed as arguments to the callback handlers configured for the run. [2] 3) callbacks BaseTool exposes a callbacks property (and also allows passing callbacks into run/arun/invoke). This is where you provide callback handlers that get called during tool execution. [2] 4) metadata BaseTool has an optional metadata attribute (dict[str, Any] | None). This metadata is associated with each tool call and passed as arguments to the handlers defined in callbacks. It can be used to identify a specific instance of a tool / use case. [3] Notes / caveats - Tool call identifiers: there is an open/archived issue noting that tool_call_id may not be forwarded to callbacks in BaseTool.run for on_tool_start/on_tool_end (depending on LangChain version), because tool_call_id is extracted but not passed in the kwargs to callback_manager.on_tool_start in BaseTool.run. [4] If you want, tell me your LangChain version (and whether you’re using invoke/run/arun or AgentExecutor). I can map exactly how tags/metadata/callbacks appear in emitted callback events for that version.

Citations:


🏁 Script executed:

# Let's check if the MCP tools from langchain_mcp_adapters are likely to have these fields set
rg "load_mcp_tools" src/cuga/backend/cuga_graph/nodes/chat/chat_agent/ -B 5 -A 10

Repository: cuga-project/cuga-agent

Length of output: 3709


🏁 Script executed:

# Check if there are any tests that verify MCP tool wrapping behavior
fd -name "*mcp*test*" -o -name "*test*mcp*" 2>/dev/null | xargs ls -la 2>/dev/null

Repository: cuga-project/cuga-agent

Length of output: 2440


🏁 Script executed:

# Look for any evidence of return_direct being used in the MCP context
rg "return_direct" src/cuga/backend/cuga_graph/nodes/chat/chat_agent/ --type py -B 3 -A 3

Repository: cuga-project/cuga-agent

Length of output: 49


🏁 Script executed:

# Check if there's a copy_update method or similar on BaseTool
rg "copy\|model_copy" src/cuga/backend/cuga_graph/nodes/chat/chat_agent/mcp_filesystem_wrapper.py --type py

Repository: cuga-project/cuga-agent

Length of output: 49


🏁 Script executed:

# Search for how tool copying is done in the codebase
rg "tool\.copy\|model_copy\|\.model_copy" src/cuga/backend/cuga_graph/ --type py -B 2 -A 2

Repository: cuga-project/cuga-agent

Length of output: 49


🏁 Script executed:

# Check if there are other examples of tool wrapping in the codebase to see best practices
rg "BaseTool\|StructuredTool" src/cuga/backend/cuga_graph/nodes/chat/chat_agent/ --type py -B 5 -A 5 | head -100

Repository: cuga-project/cuga-agent

Length of output: 49


Preserve return_direct and other BaseTool metadata when wrapping filesystem tools.

StructuredTool.from_function(...) reconstructs a fresh tool object, losing return_direct, tags, callbacks, and metadata from the original MCP tool. This causes wrapped tools to behave differently from the unwrapped ones—particularly if return_direct=True is set. At minimum, copy return_direct from the original tool:

Example fix
wrapped = StructuredTool.from_function(
    func=_invoke,
    coroutine=_ainvoke,
    name=tool.name,
    description=tool.description or "",
    args_schema=getattr(tool, "args_schema", None),
    return_direct=getattr(tool, "return_direct", False),
)

Alternatively, check BaseTool.copy() or model_copy() in langchain-core 1.2.22 to wrap invoke/ainvoke without reconstructing the tool.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cuga/backend/cuga_graph/nodes/chat/chat_agent/mcp_filesystem_wrapper.py`
around lines 88 - 94, The wrapped StructuredTool created in
StructuredTool.from_function loses BaseTool metadata (return_direct, tags,
callbacks, metadata) from the original MCP tool, causing behavioral changes;
update the wrapping in mcp_filesystem_wrapper by preserving those fields—pass
return_direct=getattr(tool, "return_direct", False) (and tags/callbacks/metadata
as appropriate) into StructuredTool.from_function or instead use the original
tool's copy/model_copy (e.g., BaseTool.copy() / model_copy()) and replace only
the invoke/ainvoke implementations (_invoke/_ainvoke) so
name/description/args_schema and all metadata stay intact.

Comment thread src/cuga/backend/cuga_graph/nodes/chat/chat_agent/mcp_filesystem_wrapper.py Outdated
Comment on lines +91 to +94
# Nothing leaked into /tmp/cuga (the legacy location)
assert not (Path("/tmp") / "cuga").exists() or not any(Path("/tmp/cuga").iterdir()), (
"no files should be created under the legacy /tmp/cuga path"
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

This legacy-path assertion is flaky against a dirty /tmp.

It fails whenever another test or a developer machine already has files under /tmp/cuga, even if this test never touched that path. Snapshot the directory before the command and assert it did not gain this test's output, rather than requiring the whole tree to be absent or empty.

🧰 Tools
🪛 Ruff (0.15.12)

[error] 92-92: Probable insecure usage of temporary file or directory: "/tmp"

(S108)


[error] 92-92: Probable insecure usage of temporary file or directory: "/tmp/cuga"

(S108)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/tests/test_local_sandbox_skills.py`
around lines 91 - 94, The flaky assertion that requires /tmp/cuga to be empty
should be replaced with a snapshot comparison: before running the code under
test (in test_local_sandbox_skills), record the set of paths under
Path("/tmp/cuga") (empty set if the directory doesn't exist), run the operation,
then capture the set of paths again and assert that the after-set is exactly the
before-set (or that no new paths exist by asserting
after_set.issubset(before_set) is False), i.e. verify the test did not add any
new files; update the assertion that currently references Path("/tmp") / "cuga"
to compare these two sets instead.

…rfacing

Replace the ad-hoc MCP filesystem demo and chat wrapper with a shared
workspace filesystem layer under cuga_lite (workspace_fs, paths, backends)
and route local, native, and opensandbox executors through it.

Demo and manager flows no longer rely on the removed file_system MCP
package; managed MCP and CLI presets are updated accordingly. Saving
enable_filesystem_tools only under advanced_features now correctly
hydrates feature_flags when loading manage config, so the UI matches
demo defaults. Manager integration tests point the MCP filesystem
server at ./cuga_workspace to match the seeded workspace layout.

Includes frontend Manage/ToolsConfig updates and a rebuilt demo bundle.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Surface skills in the chat frontend

2 participants