feat: sandbox-aware execution and Carbon side panel refresh#225
feat: sandbox-aware execution and Carbon side panel refresh#225sami-marreed wants to merge 6 commits into
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds per-thread workspace scoping for filesystem MCP tools when skills are enabled, moves local/native executors to a cwd-rooted ChangesPer-Thread Filesystem Workspace Isolation
Policy clamp and execution-context tweaks
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
Backend
- code_executor: chdir Python execution into the per-thread workspace
for local/native sandbox modes so generated code shares a CWD with
run_command (no more "/workspace/" hardcoded vs "." mismatch).
- code_wrapper: thread workspace_root through wrap_code so the
prepended chdir lives inside the wrapped async main.
- cuga_lite_graph: pass sandbox_workspace into the reflection prompt
context so strategic guidance matches the executor's actual CWD.
Prompts
- reflection_system, reflection_user, mcp_prompt: replace
"explain every case" guidance with a Jinja branch that renders
concrete path rules for the actual sandbox_workspace value
(".", "/workspace", or other). One workspace -> one set of rules.
Frontend
- ChatLanding side panel: refactor to Carbon primitives -- Accordion
+ AccordionItem for app config, ContainedList for tools, Layer for
elevation, non-clickable Tile for skill cards with expandable
descriptions, and an inline-styled underline indicator that
reliably tracks the active section tab.
- customSendMessage: fix thinkingInterval scope bug; remove the
half-finished thinking-indicator feature (to reland in a follow-up
with the streaming display work).
- api: register getSkills endpoint helper.
Build / infra
- frontend: add postinstall patch for express path-to-regexp v8
import shape; pin webpack versions and refresh pnpm-lock.yaml;
no-op setupMiddlewares hook in webpack.config.js.
- dist: regenerate frontend bundle.
- gitignore: ignore .claude/settings.local.json.
296d0ea to
6f4fbe2
Compare
There was a problem hiding this comment.
Actionable comments posted: 9
🧹 Nitpick comments (2)
src/frontend_workspaces/frontend/scripts/postinstall.cjs (1)
6-7: ⚡ Quick winUse
require.resolve()to dynamically resolve the express module path instead of version-coupled hardcoded paths.The current implementation has two brittle dependencies:
- Line 6 assumes a standard
../node_modules/layout- Lines 23-24 hardcode
express@4.22.2, requiring manual updates whenever express is bumpedReplace both with
require.resolve('express/lib/router/layer.js'), which will resolve to the correct path regardless of express version or installation layout:Proposed change
-const layerFile = path.join(__dirname, '../node_modules/express/lib/router/layer.js'); +const layerFile = require.resolve('express/lib/router/layer.js'); console.log('Looking for:', layerFile); if (fs.existsSync(layerFile)) { // ... patching logic ... } else { console.log('File not found at:', layerFile); - const altLayerFile = path.join(__dirname, '../../node_modules/.pnpm/express@4.22.2/node_modules/express/lib/router/layer.js'); - console.log('Trying alternate path:', altLayerFile); - if (fs.existsSync(altLayerFile)) { - // ... duplicate patching logic ... - } + // No fallback needed; require.resolve handles all layouts }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/frontend_workspaces/frontend/scripts/postinstall.cjs` around lines 6 - 7, Replace the brittle hardcoded path construction for the express layer with Node's resolver: change the code that assigns layerFile (currently using path.join(__dirname, '../node_modules/express/lib/router/layer.js')) to use require.resolve('express/lib/router/layer.js') and likewise replace any other hardcoded versioned express paths (the occurrences around the later block that reference express@4.22.2) to use require.resolve with the same module path; update the code that logs or uses layerFile to use the resolved path returned by require.resolve.src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/common/code_wrapper.py (1)
99-108: 💤 Low valueRedundant runtime check in generated code.
The
if {repr(workspace_root)} else Nonecondition in the generated code is unnecessary since this code path is only taken whenworkspace_rootis truthy. The generated code could be simplified.♻️ Suggested simplification
if workspace_root: - chdir_line = ( - f" _internal_os.chdir({repr(workspace_root)}) if {repr(workspace_root)} else None\n" - ) + chdir_line = f" _internal_os.chdir({workspace_root!r})\n"This also addresses the Ruff RUF010 hint by using the
!rconversion flag instead ofrepr().🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/common/code_wrapper.py` around lines 99 - 108, The generated chdir_line contains a redundant runtime conditional and uses repr(), so update the code that builds chdir_line (in the block that checks workspace_root) to emit a simple unconditional chdir call since workspace_root is already truthy; replace the repr() usage with the f-string !r conversion to satisfy the RUF010 hint (i.e., produce a string like "_internal_os.chdir({workspace_root!r})" followed by newline) so wrapped_code contains a direct chdir call when workspace_root is present.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/code_executor.py`:
- Around line 163-164: Replace the bare "except Exception: pass" in
code_executor.py (the try that attempts to determine and chdir to the workspace
root) with a logged debug-level catch so errors aren't silently swallowed;
capture and log the exception (including traceback or exception message) via the
module logger (e.g., logger.debug or processLogger.debug) and include context
like "unable to determine workspace root, continuing without chdir" so failures
in determining the workspace or import/config issues are visible during
debugging while preserving the continue behavior.
In `@src/frontend_workspaces/frontend/package.json`:
- Line 1: The scripts in package.json set DANGEROUSLY_DISABLE_HOST_CHECK but
webpack.config.js ignores it because allowedHosts is hardcoded to "all"; either
remove the env var from the scripts (e.g., the start/dev/dev:watch variants) or
update webpack.config.js to read process.env.DANGEROUSLY_DISABLE_HOST_CHECK and
conditionally set allowedHosts (e.g., if truthy set allowedHosts: "all", else
leave default/validate hosts). Locate the scripts section in package.json
(script names: start, start:win, dev, dev:real, dev:win, dev:watch,
dev:watch:win) and the allowedHosts setting in webpack.config.js (symbol:
allowedHosts) and implement one of the two fixes so the env var's presence
actually controls host validation.
In `@src/frontend_workspaces/frontend/scripts/postinstall.cjs`:
- Around line 18-21: The postinstall script currently only logs when a patch
target is missing; change the branches that log "Pattern not found in file" and
"File not found at:" (and the similar messages around lines 33-35) to terminate
with a non-zero exit (e.g., call process.exit(1) or throw an Error) so the
install fails fast when layerFile or the expected pattern isn't found; ensure
you update both occurrences that check for the file and for the pattern so
missing-patch situations do not silently exit 0.
- Around line 10-14: The current rewrite only renames the import to pathRegexp
but doesn't preserve the old API contract used by Express 4's layer.js: change
the patch so any call sites that do "this.regexp = pathRegexp(path, keys, opts)"
are rewritten to call the new shape and extract the RegExp and mutate keys
accordingly (assign this.regexp = result.regexp and push/merge result.keys into
the existing keys array) so match() continues to call this.regexp.exec() and
keys are populated; update the postinstall replacement logic that touches
pathRegexp, layer.js, and any occurrence of pathRegexp(path, keys, opts) to
perform this extraction and key-array mutation.
In `@src/frontend_workspaces/frontend/src/carbon-chat/customSendMessage.ts`:
- Line 229: The handler is emitting a single-space chunk (" ") which causes
empty assistant bubbles on early-return paths; instead either finalize the
stream by calling complete_item() before any early return (e.g., in
SuggestHumanActions, tool-approval flows, and Error paths) or stop emitting
whitespace placeholders altogether for reasoning-only updates—locate the
emit/send call that sets text: " " (also present at the other occurrences) and
change the logic to skip sending a whitespace chunk or call complete_item()
first so no blank assistant message is left behind.
- Around line 155-159: The code currently logs the raw user prompt via
console.log(`User message: ${userMessage}`) which can leak sensitive data;
update customSendMessage.ts to stop emitting the raw userMessage in production
by either removing that console.log or replacing it with safe metadata (e.g.,
log only threadId, useDraft, baseUrl, and a redacted/truncated preview or a
deterministic hash/length of userMessage). Ensure the change touches the logging
lines that reference userMessage (and keep the other logs for
threadId/useDraft/baseUrl) and gate any full-prompt output behind an explicit
dev-only check (NODE_ENV === "development") if you still need it while
debugging.
- Around line 198-201: The code reads the raw backend body into errorText and
embeds it into the thrown Error (response, errorText, throw new Error(...)),
which later gets surfaced to users; instead, log the full errorText to
console/debug for diagnostics but throw a sanitized, non-sensitive message
(e.g., `HTTP error: status ${response.status}` or `Server returned an error`)
without including errorText or raw HTML; keep the status code in the thrown
message for context and ensure any user-facing display uses that sanitized Error
instead of the raw backend body.
In `@src/frontend_workspaces/frontend/src/ChatLanding.tsx`:
- Around line 690-706: The skills fetch treats any non-OK/network error the same
as an empty list; add an explicit error state (e.g., skillsError via useState)
and set it to an Error or message inside the non-OK branch and catch block of
the existing useEffect that calls api.getSkills, while still clearing
setSkillsLoading in finally; update the component render logic that currently
shows the empty-state ("No skills installed" at the block around lines
1342-1354) to check skillsError first and render a failure message or toast when
skillsError is set (otherwise render the empty list message when skills is an
empty array), and keep existing setSkills calls unchanged for successful
responses.
- Around line 1357-1368: The skill cards use skill.name as the React key and as
the identifier for expansion state, causing collisions when names repeat; change
the identity to a stable unique composite string (for example
`${skill.source}:${skill.name}`) wherever skill.name is used as the key in the
skills.map rendering (the Tile key) and wherever expandedSkills is
consulted/updated (e.g., expandedSkills.has, toggle logic and any set
operations) so each skill is uniquely identified by source+name instead of name
alone.
---
Nitpick comments:
In
`@src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/common/code_wrapper.py`:
- Around line 99-108: The generated chdir_line contains a redundant runtime
conditional and uses repr(), so update the code that builds chdir_line (in the
block that checks workspace_root) to emit a simple unconditional chdir call
since workspace_root is already truthy; replace the repr() usage with the
f-string !r conversion to satisfy the RUF010 hint (i.e., produce a string like
"_internal_os.chdir({workspace_root!r})" followed by newline) so wrapped_code
contains a direct chdir call when workspace_root is present.
In `@src/frontend_workspaces/frontend/scripts/postinstall.cjs`:
- Around line 6-7: Replace the brittle hardcoded path construction for the
express layer with Node's resolver: change the code that assigns layerFile
(currently using path.join(__dirname,
'../node_modules/express/lib/router/layer.js')) to use
require.resolve('express/lib/router/layer.js') and likewise replace any other
hardcoded versioned express paths (the occurrences around the later block that
reference express@4.22.2) to use require.resolve with the same module path;
update the code that logs or uses layerFile to use the resolved path returned by
require.resolve.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c600dd06-6f47-45ae-ae35-463fed5aa106
⛔ Files ignored due to path filters (6)
src/cuga/frontend/dist/index.htmlis excluded by!**/dist/**src/cuga/frontend/dist/main.76f3bae552155a4125f1.jsis excluded by!**/dist/**src/cuga/frontend/dist/main.f58b9c6b13e6284dbd97.jsis excluded by!**/dist/**src/cuga/frontend/dist/vendors.7268c94d86f8c5d3e2ed.jsis excluded by!**/dist/**src/cuga/frontend/dist/vendors.b6b2c62d8ecc895d3c3e.jsis excluded by!**/dist/**src/frontend_workspaces/pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (14)
.gitignoresrc/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.pysrc/cuga/backend/cuga_graph/nodes/cuga_lite/executors/code_executor.pysrc/cuga/backend/cuga_graph/nodes/cuga_lite/executors/common/code_wrapper.pysrc/cuga/backend/cuga_graph/nodes/cuga_lite/prompts/mcp_prompt.jinja2src/cuga/backend/cuga_graph/nodes/cuga_lite/reflection/prompts/reflection_system.jinja2src/cuga/backend/cuga_graph/nodes/cuga_lite/reflection/prompts/reflection_user.jinja2src/frontend_workspaces/frontend/package.jsonsrc/frontend_workspaces/frontend/scripts/postinstall.cjssrc/frontend_workspaces/frontend/src/ChatLanding.csssrc/frontend_workspaces/frontend/src/ChatLanding.tsxsrc/frontend_workspaces/frontend/src/api.tssrc/frontend_workspaces/frontend/src/carbon-chat/customSendMessage.tssrc/frontend_workspaces/frontend/webpack.config.js
| @@ -1 +1 @@ | |||
| {"name":"@carbon/ai-chat-examples-web-components-basic","version":"0.0.0","main":"main.js","private":true,"type":"module","proxy":"http://localhost:7860","description":"A basic example using @carbon/ai-chat as a web component.","scripts":{"start":"DANGEROUSLY_DISABLE_HOST_CHECK=true webpack serve --config ./webpack.config.js","start:win":"set DANGEROUSLY_DISABLE_HOST_CHECK=true&& webpack serve --config ./webpack.config.js","build":"webpack --config ./webpack.config.js","dev":"DANGEROUSLY_DISABLE_HOST_CHECK=true FAKE_STREAM=true webpack serve --config ./webpack.config.js --mode development","dev:real":"DANGEROUSLY_DISABLE_HOST_CHECK=true webpack serve --config ./webpack.config.js --mode development","dev:win":"set DANGEROUSLY_DISABLE_HOST_CHECK=true&& set FAKE_STREAM=true&& webpack serve --config ./webpack.config.js --mode development","dev:watch":"DANGEROUSLY_DISABLE_HOST_CHECK=true webpack serve --config ./webpack.config.js --mode development --watch","dev:watch:win":"set DANGEROUSLY_DISABLE_HOST_CHECK=true&& webpack serve --config ./webpack.config.js --mode development --watch","electron:build":"npm run build && electron-builder","electron":"electron .","desktop":"electron ./electron_loader/desktop_main.cjs"},"author":"IBM Corp","license":"Apache-2.0","browserslist":"> 0.5%, last 2 versions, not dead","dependencies":{"@carbon/ai-chat":"1.6.0","@carbon/ai-chat-components":"^0.8.0","@carbon/icons-react":"^11.45.0","@carbon/react":"^1.62.0","@ibm/plex":"^6.4.0","agentic_chat":"workspace:*","copy-webpack-plugin":"^13.0.0","lucide-react":"^0.525.0","react":"^18.2.0","react-dom":"^18.2.0","react-markdown":"^10.1.0","react-router-dom":"^7.13.0"},"devDependencies":{"@babel/core":"^7.26.0","@babel/plugin-proposal-class-properties":"^7.18.6","@babel/plugin-proposal-decorators":"^7.25.9","@babel/plugin-transform-private-methods":"^7.28.6","@babel/plugin-transform-runtime":"^7.25.9","@babel/preset-env":"^7.26.0","@babel/preset-react":"^7.25.9","@babel/preset-typescript":"^7.26.0","@types/react":"^18.2.0","@types/react-dom":"^18.2.0","babel-loader":"^9.2.1","css-loader":"^7.1.2","electron":"^41.2.0","electron-builder":"^26.0.12","html-webpack-plugin":"^5.6.3","sass":"^1.77.0","sass-loader":"^16.0.0","style-loader":"^4.0.0","terser-webpack-plugin":"^5.3.14","tinyglobby":"^0.2.14","ts-lit-plugin":"^2.0.2","ts-loader":"^9.5.1","typescript":"^5.6.3","webpack":"^5.104.1","webpack-bundle-analyzer":"^5.0.1","webpack-cli":"^5.1.4","webpack-dev-server":"^5.1.0"}} No newline at end of file | |||
| {"name":"@carbon/ai-chat-examples-web-components-basic","version":"0.0.0","main":"main.js","private":true,"type":"module","proxy":"http://localhost:7860","description":"A basic example using @carbon/ai-chat as a web component.","scripts":{"postinstall":"node scripts/postinstall.cjs","start":"DANGEROUSLY_DISABLE_HOST_CHECK=true webpack serve --config ./webpack.config.js","start:win":"set DANGEROUSLY_DISABLE_HOST_CHECK=true&& webpack serve --config ./webpack.config.js","build":"webpack --config ./webpack.config.js","dev":"DANGEROUSLY_DISABLE_HOST_CHECK=true FAKE_STREAM=true webpack serve --config ./webpack.config.js --mode development","dev:real":"DANGEROUSLY_DISABLE_HOST_CHECK=true webpack serve --config ./webpack.config.js --mode development","dev:win":"set DANGEROUSLY_DISABLE_HOST_CHECK=true&& set FAKE_STREAM=true&& webpack serve --config ./webpack.config.js --mode development","dev:watch":"DANGEROUSLY_DISABLE_HOST_CHECK=true webpack serve --config ./webpack.config.js --mode development --watch","dev:watch:win":"set DANGEROUSLY_DISABLE_HOST_CHECK=true&& webpack serve --config ./webpack.config.js --mode development --watch","electron:build":"npm run build && electron-builder","electron":"electron .","desktop":"electron ./electron_loader/desktop_main.cjs"},"author":"IBM Corp","license":"Apache-2.0","browserslist":"> 0.5%, last 2 versions, not dead","dependencies":{"@carbon/ai-chat":"1.6.0","@carbon/ai-chat-components":"^0.8.0","@carbon/icons-react":"^11.45.0","@carbon/react":"^1.62.0","@ibm/plex":"^6.4.0","agentic_chat":"workspace:*","copy-webpack-plugin":"^13.0.0","lucide-react":"^0.525.0","react":"^18.2.0","react-dom":"^18.2.0","react-markdown":"^10.1.0","react-router-dom":"^7.13.0"},"devDependencies":{"@babel/core":"^7.26.0","@babel/plugin-proposal-class-properties":"^7.18.6","@babel/plugin-proposal-decorators":"^7.25.9","@babel/plugin-transform-private-methods":"^7.28.6","@babel/plugin-transform-runtime":"^7.25.9","@babel/preset-env":"^7.26.0","@babel/preset-react":"^7.25.9","@babel/preset-typescript":"^7.26.0","@types/react":"^18.2.0","@types/react-dom":"^18.2.0","babel-loader":"^9.2.1","css-loader":"^7.1.2","electron":"^41.2.0","electron-builder":"^26.0.12","html-webpack-plugin":"^5.6.3","sass":"^1.77.0","sass-loader":"^16.0.0","style-loader":"^4.0.0","terser-webpack-plugin":"^5.3.14","tinyglobby":"^0.2.14","ts-lit-plugin":"^2.0.2","ts-loader":"^9.5.1","typescript":"^5.6.3","webpack":"5.104.1","webpack-bundle-analyzer":"^5.0.1","webpack-cli":"5.1.4","webpack-dev-server":"5.1.0"}} No newline at end of file | |||
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify whether DANGEROUSLY_DISABLE_HOST_CHECK is actually consumed by project config/code.
rg -n "DANGEROUSLY_DISABLE_HOST_CHECK|allowedHosts|disableHostCheck" src/frontend_workspaces/frontend/package.json src/frontend_workspaces/frontend/webpack.config.jsRepository: cuga-project/cuga-agent
Length of output: 2725
DANGEROUSLY_DISABLE_HOST_CHECK environment variable is set but not consumed by webpack config.
Multiple npm scripts set this env var, but the webpack config hardcodes allowedHosts: "all" at line 147 without any logic to check the variable. This creates misleading intent—developers may assume the env var controls host validation, when in fact it has no effect. Either remove the env var from scripts or conditionally wire allowedHosts to respect it.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/frontend_workspaces/frontend/package.json` at line 1, The scripts in
package.json set DANGEROUSLY_DISABLE_HOST_CHECK but webpack.config.js ignores it
because allowedHosts is hardcoded to "all"; either remove the env var from the
scripts (e.g., the start/dev/dev:watch variants) or update webpack.config.js to
read process.env.DANGEROUSLY_DISABLE_HOST_CHECK and conditionally set
allowedHosts (e.g., if truthy set allowedHosts: "all", else leave
default/validate hosts). Locate the scripts section in package.json (script
names: start, start:win, dev, dev:real, dev:win, dev:watch, dev:watch:win) and
the allowedHosts setting in webpack.config.js (symbol: allowedHosts) and
implement one of the two fixes so the env var's presence actually controls host
validation.
| if (code.includes("var pathRegexp = require('path-to-regexp');")) { | ||
| code = code.replace( | ||
| "var pathRegexp = require('path-to-regexp');", | ||
| "var {pathToRegexp: pathRegexp} = require('path-to-regexp');" | ||
| ); |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
For Express 4.x lib/router/layer.js, what call signature and behavior does pathRegexprely on, and is it compatible withpath-to-regexp@8.x pathToRegexp?
💡 Result:
In Express 4.x lib/router/layer.js, pathRegexp (imported from 'path-to-regexp', specifically ~0.1.x versions like 0.1.7-0.1.12) is called with the signature pathRegexp(path, this.keys = [], opts) where path is the route path string, this.keys is an array that gets populated with parameter keys, and opts is an options object (e.g., {sensitive: false, strict: false, end: true}). It returns a RegExp object directly, which is assigned to this.regexp. The Layer constructor then sets fast path flags on this RegExp like this.regexp.fast_star = path === '*' and this.regexp.fast_slash = path === '/' && opts.end === false. The match() method uses this.regexp.exec(path) to test matches and populate params from this.keys and the exec result.[1][2][3][4][5] This is not compatible with path-to-regexp@8.x pathToRegexp, which has a breaking change in return value: it returns an object { regexp, keys } instead of a bare RegExp.[6][7][8] Assigning the result to this.regexp would make this.regexp an object (not RegExp), causing this.regexp.exec() to fail in match(), and this.keys would not be populated as expected since the keys array is handled differently (not mutated in place the same way, and syntax changes like requiring named wildcards). Express 4.x explicitly depends on path-to-regexp ~0.1.x for compatibility.[9][10][3]
Citations:
- 1: https://github.com/authhwang/express-sourceCode/blob/master/express/router/layer.md
- 2: https://github.com/expressjs/express/tree/4.16.3/lib/router
- 3: deps: bump path-to-regexp@0.1.12 expressjs/express#6209
- 4: https://www.npmjs.com/package/path-to-regexp/v/0.1.10
- 5: https://npmdoc.github.io/node-npmdoc-path-to-regexp/build/apidoc.html
- 6: https://www.npmjs.com/package/path-to-regexp
- 7: https://www.npmjs.com/package/path-to-regexp/v/8.0.0
- 8: https://github.com/pillarjs/path-to-regexp/blob/master/src/index.ts
- 9: Crash after starting using version 4.21.0 expressjs/express#5955
- 10: Dependency: Bump path-to-regexp in v4 to fix security vulnerability. expressjs/express#5968
Import-shape rewrite alone is insufficient for Express 4 compatibility with path-to-regexp v8.
The patch only changes the import binding, but Express 4's layer.js calls pathRegexp(path, keys, opts) expecting the old API contract: a bare RegExp return value with in-place array mutation. path-to-regexp@8 returns {regexp, keys} instead, causing this.regexp to become an object rather than a RegExp, which breaks this.regexp.exec() in the match() method and prevents proper key population.
Applies to lines 10–14 and 27–31.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/frontend_workspaces/frontend/scripts/postinstall.cjs` around lines 10 -
14, The current rewrite only renames the import to pathRegexp but doesn't
preserve the old API contract used by Express 4's layer.js: change the patch so
any call sites that do "this.regexp = pathRegexp(path, keys, opts)" are
rewritten to call the new shape and extract the RegExp and mutate keys
accordingly (assign this.regexp = result.regexp and push/merge result.keys into
the existing keys array) so match() continues to call this.regexp.exec() and
keys are populated; update the postinstall replacement logic that touches
pathRegexp, layer.js, and any occurrence of pathRegexp(path, keys, opts) to
perform this extraction and key-array mutation.
| console.log('Pattern not found in file'); | ||
| } | ||
| } else { | ||
| console.log('File not found at:', layerFile); |
There was a problem hiding this comment.
Script should fail fast when no patch is applied.
When the file/pattern is missing, this script only logs and exits 0. That hides a broken compatibility fix until runtime. Set a non-zero exit code when patching was expected but did not occur.
Minimal fail-fast pattern
+let patched = false;
...
fs.writeFileSync(layerFile, code);
+ patched = true;
...
- console.log('Pattern not found in file');
+ console.error('Pattern not found in file');
...
- if (fs.existsSync(altLayerFile)) {
+ if (fs.existsSync(altLayerFile)) {
...
fs.writeFileSync(altLayerFile, code);
+ patched = true;
...
}
}
+
+if (!patched) {
+ process.exitCode = 1;
+}Also applies to: 33-35
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/frontend_workspaces/frontend/scripts/postinstall.cjs` around lines 18 -
21, The postinstall script currently only logs when a patch target is missing;
change the branches that log "Pattern not found in file" and "File not found
at:" (and the similar messages around lines 33-35) to terminate with a non-zero
exit (e.g., call process.exit(1) or throw an Error) so the install fails fast
when layerFile or the expected pattern isn't found; ensure you update both
occurrences that check for the file and for the pattern so missing-patch
situations do not silently exit 0.
| console.log(`Connecting to CUGA backend at: ${baseUrl}/stream`); | ||
| console.log(`Thread ID: ${threadId}`); | ||
| console.log(`User message: ${userMessage}`); | ||
| console.log(`Use Draft: ${useDraft}`); | ||
|
|
||
|
|
There was a problem hiding this comment.
Avoid logging raw user prompts in production paths.
At Line 157, the full userMessage is logged. This can expose sensitive prompt content in browser logs/telemetry; prefer logging metadata only.
Proposed fix
- console.log(`User message: ${userMessage}`);
+ console.log(`User message length: ${userMessage.length}`);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| console.log(`Connecting to CUGA backend at: ${baseUrl}/stream`); | |
| console.log(`Thread ID: ${threadId}`); | |
| console.log(`User message: ${userMessage}`); | |
| console.log(`Use Draft: ${useDraft}`); | |
| console.log(`Connecting to CUGA backend at: ${baseUrl}/stream`); | |
| console.log(`Thread ID: ${threadId}`); | |
| console.log(`User message length: ${userMessage.length}`); | |
| console.log(`Use Draft: ${useDraft}`); |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/frontend_workspaces/frontend/src/carbon-chat/customSendMessage.ts` around
lines 155 - 159, The code currently logs the raw user prompt via
console.log(`User message: ${userMessage}`) which can leak sensitive data;
update customSendMessage.ts to stop emitting the raw userMessage in production
by either removing that console.log or replacing it with safe metadata (e.g.,
log only threadId, useDraft, baseUrl, and a redacted/truncated preview or a
deterministic hash/length of userMessage). Ensure the change touches the logging
lines that reference userMessage (and keep the other logs for
threadId/useDraft/baseUrl) and gate any full-prompt output behind an explicit
dev-only check (NODE_ENV === "development") if you still need it while
debugging.
| partial_item: { | ||
| response_type: MessageResponseTypes.TEXT, | ||
| text: "", | ||
| text: " ", |
There was a problem hiding this comment.
Whitespace stream chunks can leave empty assistant bubbles on early-return flows.
Using " " as partial text can accumulate visible blank content when the function returns before complete_item (e.g., SuggestHumanActions, tool-approval path, Error). Consider finalizing the stream before those returns, or avoid whitespace placeholders on reasoning-only updates.
Also applies to: 261-261, 282-282, 543-543
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/frontend_workspaces/frontend/src/carbon-chat/customSendMessage.ts` at
line 229, The handler is emitting a single-space chunk (" ") which causes empty
assistant bubbles on early-return paths; instead either finalize the stream by
calling complete_item() before any early return (e.g., in SuggestHumanActions,
tool-approval flows, and Error paths) or stop emitting whitespace placeholders
altogether for reasoning-only updates—locate the emit/send call that sets text:
" " (also present at the other occurrences) and change the logic to skip sending
a whitespace chunk or call complete_item() first so no blank assistant message
is left behind.
| useEffect(() => { | ||
| (async () => { | ||
| try { | ||
| const res = await api.getSkills(); | ||
| if (!res.ok) { | ||
| console.error(`Failed to load skills (${res.status} ${res.statusText})`); | ||
| } else { | ||
| const data = await res.json(); | ||
| setSkills(data.skills || []); | ||
| } | ||
| } catch (err) { | ||
| console.error("Failed to load skills:", err); | ||
| } finally { | ||
| setSkillsLoading(false); | ||
| } | ||
| })(); | ||
| }, []); |
There was a problem hiding this comment.
Differentiate skills load failures from a true empty list.
At Line 694, a non-OK response is only logged, then Line 703 clears loading and Lines 1342-1354 render “No skills installed,” which is misleading on backend/network failure. Add an explicit error state (or toast) and render a failure message instead of the empty-state copy.
Suggested fix
+ const [skillsLoadError, setSkillsLoadError] = useState<string | null>(null);
useEffect(() => {
(async () => {
try {
const res = await api.getSkills();
if (!res.ok) {
- console.error(`Failed to load skills (${res.status} ${res.statusText})`);
+ setSkillsLoadError(`Failed to load skills (${res.status} ${res.statusText})`);
} else {
const data = await res.json();
setSkills(data.skills || []);
+ setSkillsLoadError(null);
}
} catch (err) {
- console.error("Failed to load skills:", err);
+ setSkillsLoadError("Failed to load skills");
} finally {
setSkillsLoading(false);
}
})();
}, []);
- ) : skills.length === 0 ? (
+ ) : skillsLoadError ? (
+ <div style={{ padding: "1rem", color: "var(--cds-text-error)" }}>{skillsLoadError}</div>
+ ) : skills.length === 0 ? (Also applies to: 1342-1354
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/frontend_workspaces/frontend/src/ChatLanding.tsx` around lines 690 - 706,
The skills fetch treats any non-OK/network error the same as an empty list; add
an explicit error state (e.g., skillsError via useState) and set it to an Error
or message inside the non-OK branch and catch block of the existing useEffect
that calls api.getSkills, while still clearing setSkillsLoading in finally;
update the component render logic that currently shows the empty-state ("No
skills installed" at the block around lines 1342-1354) to check skillsError
first and render a failure message or toast when skillsError is set (otherwise
render the empty list message when skills is an empty array), and keep existing
setSkills calls unchanged for successful responses.
| {skills.map((skill) => { | ||
| const isExpanded = expandedSkills.has(skill.name); | ||
| const descLength = skill.description?.length || 0; | ||
| const shouldTruncate = descLength > 100; | ||
| const displayDesc = isExpanded || !shouldTruncate | ||
| ? skill.description | ||
| : truncateText(skill.description || "", 100); | ||
|
|
||
| return ( | ||
| <Tile | ||
| key={skill.name} | ||
| style={{ padding: 0, overflow: "hidden", borderRadius: "4px" }} |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
wc -l src/frontend_workspaces/frontend/src/ChatLanding.tsxRepository: cuga-project/cuga-agent
Length of output: 125
🏁 Script executed:
sed -n '1357,1380p' src/frontend_workspaces/frontend/src/ChatLanding.tsxRepository: cuga-project/cuga-agent
Length of output: 1216
🏁 Script executed:
sed -n '1430,1450p' src/frontend_workspaces/frontend/src/ChatLanding.tsxRepository: cuga-project/cuga-agent
Length of output: 1066
🏁 Script executed:
# Search for skill type definition or interface
rg -n "interface.*[Ss]kill|type.*[Ss]kill" src/frontend_workspaces/frontend/src/ChatLanding.tsx -A 5Repository: cuga-project/cuga-agent
Length of output: 288
🏁 Script executed:
sed -n '1357,1450p' src/frontend_workspaces/frontend/src/ChatLanding.tsx | grep -n "skill\\.name" -C 2Repository: cuga-project/cuga-agent
Length of output: 1174
🏁 Script executed:
# Look for all occurrences of skill.name within the skills.map block
sed -n '1357,1450p' src/frontend_workspaces/frontend/src/ChatLanding.tsxRepository: cuga-project/cuga-agent
Length of output: 4674
Use a stable unique key for skills cards and expansion state.
At Line 1367 and Lines 1438-1440, skill.name is used as identity. If two skills share the same name, React key collisions and coupled expand/collapse behavior occur. Use a unique key (e.g., ${skill.source}:${skill.name}).
Suggested fix
- {skills.map((skill) => {
- const isExpanded = expandedSkills.has(skill.name);
+ {skills.map((skill) => {
+ const skillKey = `${skill.source}:${skill.name}`;
+ const isExpanded = expandedSkills.has(skillKey);
return (
<Tile
- key={skill.name}
+ key={skillKey}
style={{ padding: 0, overflow: "hidden", borderRadius: "4px" }}
>
...
- next.delete(skill.name);
+ next.delete(skillKey);
...
- next.add(skill.name);
+ next.add(skillKey);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| {skills.map((skill) => { | |
| const isExpanded = expandedSkills.has(skill.name); | |
| const descLength = skill.description?.length || 0; | |
| const shouldTruncate = descLength > 100; | |
| const displayDesc = isExpanded || !shouldTruncate | |
| ? skill.description | |
| : truncateText(skill.description || "", 100); | |
| return ( | |
| <Tile | |
| key={skill.name} | |
| style={{ padding: 0, overflow: "hidden", borderRadius: "4px" }} | |
| {skills.map((skill) => { | |
| const skillKey = `${skill.source}:${skill.name}`; | |
| const isExpanded = expandedSkills.has(skillKey); | |
| const descLength = skill.description?.length || 0; | |
| const shouldTruncate = descLength > 100; | |
| const displayDesc = isExpanded || !shouldTruncate | |
| ? skill.description | |
| : truncateText(skill.description || "", 100); | |
| return ( | |
| <Tile | |
| key={skillKey} | |
| style={{ padding: 0, overflow: "hidden", borderRadius: "4px" }} |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/frontend_workspaces/frontend/src/ChatLanding.tsx` around lines 1357 -
1368, The skill cards use skill.name as the React key and as the identifier for
expansion state, causing collisions when names repeat; change the identity to a
stable unique composite string (for example `${skill.source}:${skill.name}`)
wherever skill.name is used as the key in the skills.map rendering (the Tile
key) and wherever expandedSkills is consulted/updated (e.g., expandedSkills.has,
toggle logic and any set operations) so each skill is uniquely identified by
source+name instead of name alone.
There was a problem hiding this comment.
🧹 Nitpick comments (2)
src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/common/code_wrapper.py (1)
99-103: 💤 Low valueRedundant runtime conditional in generated code.
The
if {repr(workspace_root)} else Noneguard is always true at runtime because thechdir_linestring is only set whenworkspace_rootis truthy (line 100/161). The conditional evaluates a non-empty string path which is always truthy.♻️ Simplify the generated chdir line
chdir_line = "" if workspace_root: - chdir_line = ( - f" _internal_os.chdir({repr(workspace_root)}) if {repr(workspace_root)} else None\n" - ) + chdir_line = f" _internal_os.chdir({workspace_root!r})\n"Apply to both occurrences (lines 101-103 and 162-164). This also addresses the RUF010 linter hint by using
!rconversion flag instead ofrepr().Also applies to: 160-164
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/common/code_wrapper.py` around lines 99 - 103, The generated chdir_line currently embeds a redundant runtime conditional and uses repr(); change the generation so when workspace_root is truthy you emit a simple unconditional chdir call like "_internal_os.chdir({workspace_root!r})\n" (no "if ... else None"), and update both occurrences that set chdir_line (the two places assigning chdir_line in code_wrapper.py) to use the !r conversion flag instead of repr() so the generated code is simplified and the RUF010 lint warning is resolved.src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/code_executor.py (1)
176-179: 💤 Low valueDocument the purpose of unconditional
_internal_reinjection.The
_internal_remodule is injected unconditionally into the execution context for user code, whereas_internal_osis conditional. The comment states it's "useful for tools and not dangerous," but doesn't specify which tools or use cases require it. Clarify in comments or documentation whether this is necessary for specific user-facing functionality or if it represents a general best practice for dynamic code execution.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/code_executor.py` around lines 176 - 179, Add a clear explanatory comment above the unconditional injection of _internal_re in code_executor.py: state which user-facing tools or execution paths rely on this (e.g., regex-based parsing, pattern matching, validation, tokenization utilities), why it's considered safe to always expose (pure library, no I/O), and note the contrast with conditional _internal_os injection so reviewers understand the security rationale; reference the exact symbol _internal_re and the related _internal_os in the comment so future maintainers know why one is unconditional and the other is gated.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/code_executor.py`:
- Around line 176-179: Add a clear explanatory comment above the unconditional
injection of _internal_re in code_executor.py: state which user-facing tools or
execution paths rely on this (e.g., regex-based parsing, pattern matching,
validation, tokenization utilities), why it's considered safe to always expose
(pure library, no I/O), and note the contrast with conditional _internal_os
injection so reviewers understand the security rationale; reference the exact
symbol _internal_re and the related _internal_os in the comment so future
maintainers know why one is unconditional and the other is gated.
In
`@src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/common/code_wrapper.py`:
- Around line 99-103: The generated chdir_line currently embeds a redundant
runtime conditional and uses repr(); change the generation so when
workspace_root is truthy you emit a simple unconditional chdir call like
"_internal_os.chdir({workspace_root!r})\n" (no "if ... else None"), and update
both occurrences that set chdir_line (the two places assigning chdir_line in
code_wrapper.py) to use the !r conversion flag instead of repr() so the
generated code is simplified and the RUF010 lint warning is resolved.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e40648e7-fcd8-49cb-ba0a-17c1dc157f21
⛔ Files ignored due to path filters (6)
src/cuga/frontend/dist/index.htmlis excluded by!**/dist/**src/cuga/frontend/dist/main.76f3bae552155a4125f1.jsis excluded by!**/dist/**src/cuga/frontend/dist/main.f58b9c6b13e6284dbd97.jsis excluded by!**/dist/**src/cuga/frontend/dist/vendors.7268c94d86f8c5d3e2ed.jsis excluded by!**/dist/**src/cuga/frontend/dist/vendors.b6b2c62d8ecc895d3c3e.jsis excluded by!**/dist/**src/frontend_workspaces/pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (14)
.gitignoresrc/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.pysrc/cuga/backend/cuga_graph/nodes/cuga_lite/executors/code_executor.pysrc/cuga/backend/cuga_graph/nodes/cuga_lite/executors/common/code_wrapper.pysrc/cuga/backend/cuga_graph/nodes/cuga_lite/prompts/mcp_prompt.jinja2src/cuga/backend/cuga_graph/nodes/cuga_lite/reflection/prompts/reflection_system.jinja2src/cuga/backend/cuga_graph/nodes/cuga_lite/reflection/prompts/reflection_user.jinja2src/frontend_workspaces/frontend/package.jsonsrc/frontend_workspaces/frontend/scripts/postinstall.cjssrc/frontend_workspaces/frontend/src/ChatLanding.csssrc/frontend_workspaces/frontend/src/ChatLanding.tsxsrc/frontend_workspaces/frontend/src/api.tssrc/frontend_workspaces/frontend/src/carbon-chat/customSendMessage.tssrc/frontend_workspaces/frontend/webpack.config.js
✅ Files skipped from review due to trivial changes (2)
- .gitignore
- src/frontend_workspaces/frontend/src/api.ts
🚧 Files skipped from review as they are similar to previous changes (9)
- src/frontend_workspaces/frontend/webpack.config.js
- src/cuga/backend/cuga_graph/nodes/cuga_lite/reflection/prompts/reflection_user.jinja2
- src/cuga/backend/cuga_graph/nodes/cuga_lite/reflection/prompts/reflection_system.jinja2
- src/cuga/backend/cuga_graph/nodes/cuga_lite/prompts/mcp_prompt.jinja2
- src/frontend_workspaces/frontend/scripts/postinstall.cjs
- src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py
- src/frontend_workspaces/frontend/src/ChatLanding.css
- src/frontend_workspaces/frontend/src/carbon-chat/customSendMessage.ts
- src/frontend_workspaces/frontend/src/ChatLanding.tsx
The sandbox-aware chdir introduced in the prior commit assumed the
per-thread workspace directory already existed on disk. LocalExecutor
(unlike LocalSandboxExecutor, which mkdirs before shell flows) never
creates it, and unit tests typically pass thread_id=None ->
/tmp/cuga/_default/workspace, which does not exist -> os.chdir raised
FileNotFoundError before any user code ran, leaving _locals empty and
cascading into dozens of "new_vars == {}" assertion failures.
Mirror the LocalSandboxExecutor pattern: mkdir(parents=True,
exist_ok=True) the resolved workspace path before passing it into
CodeWrapper. The mkdir lives inside the existing try/except so any
permission or I/O error still falls back to "no chdir" cleanly
(workspace_root_for_wrap = None) and execution proceeds with the
process CWD as it did before this branch.
Smoke-verified locally with thread_id=None: workspace path is created
on first call and chdir succeeds.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/code_executor.py`:
- Around line 177-183: The code injects os into the user execution locals via
_internal_os (when workspace_root_for_wrap is truthy), exposing dangerous APIs
to user code; change this so os is only available to the wrapper plumbing and
never present in the user-visible execution namespace: move the chdir/workspace
logic to run inside the wrapper scope (use a separate dict for wrapper-only exec
or call wrapper functions with a local-only namespace), or ensure you remove
_internal_os from context_locals/_locals before executing user code (and
likewise remove any similar temporary symbols created around the new_vars.pop
usage); update the same pattern referenced around the lines that set
_internal_os (and the similar 222-223 spots) so validate_imports() still governs
all imports accessible to user code.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 856366fb-13c6-4d53-b0b0-2466c222185e
📒 Files selected for processing (1)
src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/code_executor.py
| ) | ||
| reflection_agent = reflection_task(llm=active_model) | ||
| # Calculate sandbox_workspace using same logic as prepare_tools_and_apps | ||
| _sandbox_mode = getattr(settings.advanced_features, "sandbox_mode", "opensandbox") |
There was a problem hiding this comment.
[MED-1] Sandbox workspace logic duplicated in three places — divergence risk
src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py:2222
Explanation
The expression "." if _sandbox_mode in ("native", "local") else "/workspace" now appears in at least three places:
prepare_tools_and_apps(line ~1695) — for the MCP promptsandbox()(line ~2222) — for the reflection prompt (this PR)code_executor.py(line ~151) — for the chdir logic (this PR)
Each site independently reads settings.advanced_features.sandbox_mode with getattr(..., "opensandbox"). If a new sandbox mode is added (e.g., "docker"), all three sites must be updated in sync. The comment in cuga_lite_graph.py even says "using same logic as prepare_tools_and_apps", which is a code smell.
Suggested Fix
Extract a shared helper, e.g. in a utility module:
# e.g. in executors/common/sandbox_utils.py
def get_sandbox_workspace(sandbox_mode: str) -> str:
return "." if sandbox_mode in ("native", "local") else "/workspace"
def current_sandbox_workspace() -> str:
mode = getattr(settings.advanced_features, "sandbox_mode", "opensandbox")
return get_sandbox_workspace(mode)Then all three call sites use current_sandbox_workspace().
Agent Prompt
In src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py around line 2222, and also in code_executor.py around line 151, the expression "." if _sandbox_mode in ("native", "local") else "/workspace" is duplicated. A third copy exists in prepare_tools_and_apps around line 1695. Extract this into a shared helper function to avoid divergence when new sandbox modes are added.
| @@ -0,0 +1,36 @@ | |||
| #!/usr/bin/env node | |||
There was a problem hiding this comment.
[MED-2] postinstall.cjs patches a vendored file in node_modules — fragile and non-idiomatic
src/frontend_workspaces/frontend/scripts/postinstall.cjs:1
Explanation
The postinstall script directly mutates express/lib/router/layer.js inside node_modules to fix a path-to-regexp v8 import shape incompatibility. This approach is fragile:
- The patch is silently skipped if the file path doesn't match (the script just logs "Pattern not found" or "File not found" and exits 0).
pnpmmay re-hoist or re-link the file on anypnpm install, silently reverting the patch.- The alternate path (
../../node_modules/.pnpm/express@4.22.2/...) is hardcoded to a specific express version — it will silently fail if express is upgraded. - There is no verification that the patch was actually applied.
The standard approach for this kind of fix is to use patch-package (or pnpm's built-in pnpm patch) which integrates with the lockfile and is re-applied reliably.
Suggested Fix
Use pnpm patch express to generate a proper patch file that pnpm applies automatically and tracks in the lockfile. This is the idiomatic pnpm approach and doesn't require a custom postinstall script.
Alternatively, pin path-to-regexp to a v7.x version that still exports the old API shape, avoiding the need for the patch entirely.
Agent Prompt
In src/frontend_workspaces/frontend/scripts/postinstall.cjs, the script patches express/lib/router/layer.js directly in node_modules. Review whether this is the right approach: the patch silently no-ops if the file path doesn't match, and pnpm may revert it on reinstall. Consider replacing this with pnpm patch (which generates a .patches/ file tracked in the lockfile) or pinning path-to-regexp to a compatible version.
Not sure about this one, it just proposes a different solution
| # The directory is created here (mirroring LocalSandboxExecutor's behavior for shell flows) | ||
| # so LocalExecutor paths and unit tests with thread_id=None don't crash on os.chdir. | ||
| workspace_root_for_wrap = None | ||
| if mode == 'local': |
There was a problem hiding this comment.
[MED-3] _sandbox_mode == 'native' branch in code_executor.py is unreachable when mode == 'local'
src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/code_executor.py:153
Explanation
The outer condition is if mode == 'local', but inside it checks if _sandbox_mode == 'native' and uses native_thread_workspace_root. The mode parameter is set to 'local' when settings.advanced_features.e2b_sandbox is False (line ~128), regardless of sandbox_mode. When sandbox_mode == 'native', the NativeSandboxExecutor is used (not LocalExecutor), so mode would not be 'local' in that path. This means the native branch inside if mode == 'local' may never actually be reached in practice, and the local_thread_workspace_root fallback is always used. If the intent is to support native mode here, the outer condition needs to be broadened.
Suggested Fix
Clarify the intent. If native sandbox mode should also get the chdir treatment, the condition should be:
if mode in ('local',) or _sandbox_mode in ('native', 'local'):
...Or document explicitly why native mode is handled inside the mode == 'local' branch (e.g., because native mode forces mode = 'local' upstream).
Agent Prompt
In src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/code_executor.py around line 149, the workspace chdir logic is gated on mode == 'local', but inside it branches on _sandbox_mode == 'native' to use native_thread_workspace_root. Verify whether mode can actually be 'local' when sandbox_mode is 'native' — trace how mode is set upstream. If the native branch is unreachable, either remove it or fix the outer condition.
Not sure i understand the difference between mode and sandbox_node
…-paths-only at LLM
Unifies three previously-divergent ideas about where the agent's workspace
lives: the demo CLI/MCP server's `./cuga_workspace`, the skills feature's
`/tmp/cuga/<thread>/workspace`, and the host-process `chdir` added in the
previous PR commit (which broke CI by moving Python's CWD outside the
filesystem MCP server's allowed root).
Conceptual model
- The agent runs in a sandbox whose CWD IS the workspace; the LLM is told
to use relative paths only. The runtime owns the host-side mapping per
mode and per thread.
- `settings.skills.enabled` gates layout:
skills off -> shared <cwd>/cuga_workspace/ (demo CLI parity)
skills on -> per-thread <cwd>/cuga_workspace/<safe_thread_id>/
- The filesystem MCP server is launched ONCE with allowed-dir AND CWD set
to `<cwd>/cuga_workspace/`, so it serves every thread without restart.
Backend
- local_/native_thread_workspace_root: skills-conditional, rooted at
<cwd>/cuga_workspace; drop the tempfile.gettempdir() / /private/tmp
workspace path. Native seatbelt policy now allows writes under
<cwd>/cuga_workspace as well as the existing /private/tmp (venv/caches).
- seed_thread_workspace_if_needed: when a per-thread workspace is freshly
created, copy every top-level file from the shared parent into it so
CI/demo fixtures (e.g. contacts.txt) are reachable without workflow
changes. Idempotent via a `.cuga_seeded` sentinel.
- code_executor: delete the workspace-resolve + mkdir + host-process
`os.chdir` block (and the `_internal_os` injection it required).
Workspace CWD alignment is now handled by each entry point at launch
time, not by mutating Python's CWD.
- code_wrapper: drop `workspace_root` param + `chdir_line` plumbing; goes
back to pre-PR shape. `os` is intentionally NOT injected into the user
namespace, so SecurityValidator.validate_imports() remains the sole gate.
MCP filesystem
- managed_mcp.py BOOTSTRAP_YAML now spawns the filesystem MCP subprocess
with `cwd: ./cuga_workspace`, and changes its allowed-dir arg from
`./cuga_workspace` to `.` (resolves against the new CWD). Persisted YAMLs
from older installs are migrated lazily on startup.
- mcp_manager: ServiceConfig gains an optional `cwd:` field; the STDIO
transport now forwards it to `StdioTransport(cwd=...)` and mkdirs the
target so the subprocess doesn't fail to start.
- NEW mcp_filesystem_wrapper.py: when skills are enabled, wraps every
`filesystem_*` MCP tool so relative path args get prefixed with
`<safe_thread_id>/` before forwarding (covers both StructuredTool path
via chat_agent._build_runtime_context and the raw-function path via
cuga_lite_graph's tools_context_dict). No-op when skills are off.
Prompts / CLI
- mcp_prompt.jinja2 + reflection_system.jinja2: collapse the per-mode
`{% if sandbox_workspace == ... %}` branches into a single rule -
"use relative paths only; your CWD is the workspace". reflection_user
no longer surfaces a workspace path at all.
- demo CLI (cli.py): CUGA_POLICIES_CONTENT rewritten from
"write only from <absolute path>" to "use relative paths only". The
demo filesystem MCP subprocess is now spawned with `cwd=workspace`.
Hardening
- NativeSandboxExecutor._ensure_venv: previously trusted `Path('/tmp/.venv')
.exists()` and unconditionally flipped `_venv_ready=True` after creation
attempts - a stale empty dir or a silently-failed venv left every later
`run_command` reporting `/tmp/.venv/bin/activate: No such file or
directory`. Now validates `bin/activate.is_file()`, wipes stale dirs
before recreating, and only marks ready after verifying the file the
sandbox prelude actually sources.
Tests
- NEW src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/tests/
test_local_sandbox_skills.py - 13 tests covering: per-thread workspace
placement, two-thread isolation, seed-on-first-use (top-level files
only, skips dirs/dotfiles), shared workspace when skills off, sandbox
write/read/list_files round-trip, path-escape rejection, and the MCP
wrapper layer (relative prefix, absolute pass-through, skills-off
no-op, only filesystem_* tools wrapped, BaseTool wrapper preserves
name/description).
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/local/local_sandbox_executor.py`:
- Around line 94-99: The shared-workspace detection currently compares
thread_root to its parent (thread_root.resolve() == parent.resolve()), which is
always false and causes seeding to run incorrectly; update the condition to
detect when the thread directory is actually the top-level cuga workspace by
comparing the parent to the workspace root (e.g., replace the current check with
parent.resolve() == self.cuga_workspace.resolve() or equivalent workspace-root
comparison) so the function (in local_sandbox_executor.py around the thread_root
handling code) returns early when the workspace is shared; apply the same fix to
the other occurrences noted (the blocks at the other two similar locations).
In
`@src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/native/native_sandbox_executor.py`:
- Around line 86-91: The guard incorrectly compares thread_root to its parent
and therefore misses shared-workspace mode; update the conditional in
native_sandbox_executor (the block using thread_root and parent) to also detect
and skip shared-workspace seeding — e.g., return early if thread_root.resolve()
== parent.resolve() OR if parent.resolve() == workspace_root.resolve() (or if a
boolean flag like self.shared_workspace_mode / config.shared_workspace is set)
so shared-workspace mode is skipped and the unintended copy behavior is avoided.
In `@src/cuga/backend/tools_env/registry/config/config_loader.py`:
- Line 55: The new cwd field on the config dataclass is never populated when
building ServiceConfig, causing subprocesses to run with None CWD; update the
_create_service_config(...) routine to read the cwd value from the parsed config
(where cwd: Optional[str] is declared) and assign it to the ServiceConfig
instance being returned (ensure ServiceConfig.cwd is set), so that consumers
that spawn STDIO transport subprocesses use the configured working directory.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 48266c10-2e4d-4c58-a99a-4b447b14c637
📒 Files selected for processing (14)
docs/examples/demo_apps/setup/cli.pysrc/cuga/backend/cuga_graph/nodes/chat/chat_agent/chat_agent.pysrc/cuga/backend/cuga_graph/nodes/chat/chat_agent/mcp_filesystem_wrapper.pysrc/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.pysrc/cuga/backend/cuga_graph/nodes/cuga_lite/executors/code_executor.pysrc/cuga/backend/cuga_graph/nodes/cuga_lite/executors/common/code_wrapper.pysrc/cuga/backend/cuga_graph/nodes/cuga_lite/executors/local/local_sandbox_executor.pysrc/cuga/backend/cuga_graph/nodes/cuga_lite/executors/native/native_sandbox_executor.pysrc/cuga/backend/cuga_graph/nodes/cuga_lite/executors/tests/test_local_sandbox_skills.pysrc/cuga/backend/cuga_graph/nodes/cuga_lite/prompts/mcp_prompt.jinja2src/cuga/backend/cuga_graph/nodes/cuga_lite/reflection/prompts/reflection_system.jinja2src/cuga/backend/server/managed_mcp.pysrc/cuga/backend/tools_env/registry/config/config_loader.pysrc/cuga/backend/tools_env/registry/mcp_manager/mcp_manager.py
✅ Files skipped from review due to trivial changes (1)
- src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/common/code_wrapper.py
🚧 Files skipped from review as they are similar to previous changes (1)
- src/cuga/backend/cuga_graph/nodes/cuga_lite/reflection/prompts/reflection_system.jinja2
| command: Optional[str] = None | ||
| args: Optional[List[str]] = None | ||
| env: Optional[Dict[str, str]] = None # Environment variables for STDIO transport | ||
| cwd: Optional[str] = None # Working directory for STDIO transport subprocess |
There was a problem hiding this comment.
cwd is declared but never loaded into ServiceConfig (feature break + scope risk).
Line 55 adds cwd, but _create_service_config(...) never sets it when building ServiceConfig (Lines 243-259). That means config.cwd stays None, so stdio MCP processes won’t honor workspace CWD.
Suggested fix
service_config = ServiceConfig(
name=service_name,
description=config.get('description'),
url=config.get('url'),
command=config.get('command'),
args=config.get('args'),
env=env_config,
+ cwd=config.get('cwd'),
transport=config.get('transport'),
auth=auth,
include=config.get('include'),
type=service_type,🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/cuga/backend/tools_env/registry/config/config_loader.py` at line 55, The
new cwd field on the config dataclass is never populated when building
ServiceConfig, causing subprocesses to run with None CWD; update the
_create_service_config(...) routine to read the cwd value from the parsed config
(where cwd: Optional[str] is declared) and assign it to the ServiceConfig
instance being returned (ensure ServiceConfig.cwd is set), so that consumers
that spawn STDIO transport subprocesses use the configured working directory.
…clamp policy confidence Three follow-up cleanups to the workspace-migration commit (52b0c19), all discovered while running run_tests.sh. Drop seed-on-first-use - Stability tests run with `settings.skills.enabled = false` (default), in which case `local_thread_workspace_root(...)` returns the shared parent `./cuga_workspace/` and the seed helper was already a no-op. - The only path where the seed did anything is skills-on at runtime, and there it propagated every parent-level file into every new thread on first shell/MCP call — fixtures, scratch notes, anything sitting at the workspace root. That was magic and surprising, not desirable. - Removed the `seed_thread_workspace_if_needed` helper from both local_sandbox_executor.py and native_sandbox_executor.py, removed all four call sites (in `_run_command` / `_run_sandboxed` / two `create_list_files_tool` paths), and trimmed the chat-agent wrapper's `_ensure_thread_workspace_seeded` to `_ensure_thread_workspace_exists` (mkdir only — MCP filesystem writes still need a real directory to land in). Mkdir at each call site is retained; only the copy step is gone. - Updated test_local_sandbox_skills.py: dropped seed-fixture assertions from the two integration tests that asserted them, replaced them with "parent files MUST NOT bleed into per-thread dir" assertions (the inverse), and removed test_local_sandbox_seed_skips_other_thread_dirs entirely (it was the seed-specific test). Gate `_internal_re` injection behind skills - code_executor.py always injected `re` as `_internal_re` into the user exec namespace. That pushed `context_locals` past the `if not context_locals: return` short-circuit in `SecurityValidator.validate_context_usage`, which then rejected every unit test that runs bare user code like `y = 20` with the error "Code must use at least one variable or tool from context". - 12 tests in `test_code_executor.py`, `test_e2b_lite.py`, `test_variable_creation_order.py`, and `test_tool_call_timeout.py` were failing as a result. The injection itself is only useful for skill flows that lean on regex helpers; scope it to `_skills_enabled()` so plain user-code paths keep `context_locals` empty and the short-circuit fires. - Patched test_e2b_lite.py::test_eval_with_e2b_disabled to set `mock_settings.skills.enabled = False` explicitly. Bare MagicMock() returns a truthy proxy for any attribute, so the test had been unintentionally exercising the skills-on path. Clamp policy confidence to [0, 1] - When both a keyword trigger AND a natural-language trigger fire on the same IntentGuard, `match_policy` combined the two confidence scores and occasionally produced `1.0000000000000002` (float rounding). `PolicyMatch.confidence` has `Field(le=1.0)`, so Pydantic raised a validation error, the surrounding `except` swallowed it, and the policy match silently disappeared (matched=False, empty metadata). - test_e2e_intent_guard_blocks_in_cuga_lite was the visible casualty (`assert metadata.get("policy_blocked")` against `{}`). Defensive clamp at the PolicyMatch construction site in policy/agent.py keeps the fix one line and doesn't change semantics for any other call site. Verification - `uv run pytest src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/tests src/cuga/backend/cuga_graph/policy/tests` -> 166 passed, 0 failed, 2m04s.
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/native/native_sandbox_executor.py (1)
123-139:⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy liftScope the Seatbelt allowlist to the resolved thread workspace, not the shared parent.
In skills-enabled mode,
_run_sandboxed()cds into<cwd>/cuga_workspace/<thread>, but_build_policy()grantsfile-write*to the whole<cwd>/cuga_workspacetree and_ensure_policy()caches that policy once. A command likeecho x > ../thread-B/pwned.txtcan therefore mutate a sibling workspace, and any later hostos.chdir()makes the cached whitelist diverge fromnative_thread_workspace_root(). Build/cache policies by resolvedworkspace_rootinstead.Also applies to: 160-163
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/native/native_sandbox_executor.py` around lines 123 - 139, The Seatbelt policy currently grants file-write* to the whole CUGA_WORKSPACE_DIRNAME parent via _build_policy(), which allows cross-thread writes when _run_sandboxed() cd's into a per-thread workspace; change _build_policy() to accept or compute the resolved per-thread workspace_root (use native_thread_workspace_root() or the same resolved Path used by _run_sandboxed()) and emit the policy with (subpath "{workspace_root}") instead of the shared parent, and update _ensure_policy() to cache policies keyed by that resolved workspace_root so each thread gets its own scoped allowlist.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/cuga/backend/cuga_graph/nodes/chat/chat_agent/mcp_filesystem_wrapper.py`:
- Around line 98-104: The helper _skills_enabled currently catches all
exceptions and silently returns False; change it to only swallow expected
lookup/import failures (catch ImportError and AttributeError) and return False
for those, while logging and re-raising any other unexpected exceptions so
failures don't silently disable isolation. Update the function to import or use
a module logger (e.g., logging.getLogger(__name__) or the existing logger if
present), log the unexpected exception with context ("_skills_enabled failed")
and then re-raise it instead of returning False; keep the existing
getattr(settings, "skills", None).enabled lookup and the boolean return behavior
for the normal path.
- Around line 51-58: The _rewrite_path function currently allows paths like
"../thread-B/notes.txt" which become "thread-A/../thread-B/notes.txt" and escape
the intended thread folder; fix _rewrite_path (used with _is_absolute) to reject
any path that contains parent-directory segments before prefixing: after
stripping a leading "./" and handling the empty/"." case, split the remaining
path by "/" and if any segment equals ".." raise an error (e.g., ValueError) or
otherwise refuse the input, so the prefixing logic never produces paths that can
traverse up into sibling thread directories.
- Around line 88-94: The wrapped StructuredTool created in
StructuredTool.from_function loses BaseTool metadata (return_direct, tags,
callbacks, metadata) from the original MCP tool, causing behavioral changes;
update the wrapping in mcp_filesystem_wrapper by preserving those fields—pass
return_direct=getattr(tool, "return_direct", False) (and tags/callbacks/metadata
as appropriate) into StructuredTool.from_function or instead use the original
tool's copy/model_copy (e.g., BaseTool.copy() / model_copy()) and replace only
the invoke/ainvoke implementations (_invoke/_ainvoke) so
name/description/args_schema and all metadata stay intact.
In
`@src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/tests/test_local_sandbox_skills.py`:
- Around line 91-94: The flaky assertion that requires /tmp/cuga to be empty
should be replaced with a snapshot comparison: before running the code under
test (in test_local_sandbox_skills), record the set of paths under
Path("/tmp/cuga") (empty set if the directory doesn't exist), run the operation,
then capture the set of paths again and assert that the after-set is exactly the
before-set (or that no new paths exist by asserting
after_set.issubset(before_set) is False), i.e. verify the test did not add any
new files; update the assertion that currently references Path("/tmp") / "cuga"
to compare these two sets instead.
---
Outside diff comments:
In
`@src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/native/native_sandbox_executor.py`:
- Around line 123-139: The Seatbelt policy currently grants file-write* to the
whole CUGA_WORKSPACE_DIRNAME parent via _build_policy(), which allows
cross-thread writes when _run_sandboxed() cd's into a per-thread workspace;
change _build_policy() to accept or compute the resolved per-thread
workspace_root (use native_thread_workspace_root() or the same resolved Path
used by _run_sandboxed()) and emit the policy with (subpath "{workspace_root}")
instead of the shared parent, and update _ensure_policy() to cache policies
keyed by that resolved workspace_root so each thread gets its own scoped
allowlist.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 6e167e18-7328-40e0-8315-6965ba7c00d4
📒 Files selected for processing (7)
src/cuga/backend/cuga_graph/nodes/chat/chat_agent/mcp_filesystem_wrapper.pysrc/cuga/backend/cuga_graph/nodes/cuga_lite/executors/code_executor.pysrc/cuga/backend/cuga_graph/nodes/cuga_lite/executors/local/local_sandbox_executor.pysrc/cuga/backend/cuga_graph/nodes/cuga_lite/executors/native/native_sandbox_executor.pysrc/cuga/backend/cuga_graph/nodes/cuga_lite/executors/tests/test_e2b_lite.pysrc/cuga/backend/cuga_graph/nodes/cuga_lite/executors/tests/test_local_sandbox_skills.pysrc/cuga/backend/cuga_graph/policy/agent.py
🚧 Files skipped from review as they are similar to previous changes (1)
- src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/code_executor.py
| wrapped = StructuredTool.from_function( | ||
| func=_invoke, | ||
| coroutine=_ainvoke, | ||
| name=tool.name, | ||
| description=tool.description or "", | ||
| args_schema=getattr(tool, "args_schema", None), | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's look at the file in question
cat -n src/cuga/backend/cuga_graph/nodes/chat/chat_agent/mcp_filesystem_wrapper.py | head -120Repository: cuga-project/cuga-agent
Length of output: 5269
🏁 Script executed:
# Check for requirements/dependencies to determine langchain-core version
fd -type f "requirements*" -o "setup.py" -o "pyproject.toml" | head -20Repository: cuga-project/cuga-agent
Length of output: 238
🏁 Script executed:
# Also search for any langchain version specifications
rg "langchain" --type py | grep -i "import\|from" | head -20Repository: cuga-project/cuga-agent
Length of output: 1930
🏁 Script executed:
# Look for dependency files
fd -name "pyproject.toml" -o -name "requirements.txt" -o -name "setup.py" -o -name "setup.cfg"Repository: cuga-project/cuga-agent
Length of output: 298
🏁 Script executed:
# Check the pyproject.toml or requirements files for langchain version
cat pyproject.toml 2>/dev/null | grep -A 50 "dependencies\|langchain" || cat requirements.txt 2>/dev/null || trueRepository: cuga-project/cuga-agent
Length of output: 7464
🏁 Script executed:
# Let's check the langchain-core documentation or source to understand StructuredTool.from_function()
# First, let me search for how tools are created in the codebase
rg "from_function" --type py -B 3 -A 3Repository: cuga-project/cuga-agent
Length of output: 35568
🏁 Script executed:
# Let's also check if there are other uses of BaseTool in the codebase to see what metadata matters
rg "return_direct\|tags\|callbacks" --type py | head -30Repository: cuga-project/cuga-agent
Length of output: 49
🏁 Script executed:
# Let's see if there's any documentation or tests about tool wrapping
fd -name "*test*mcp*" -o -name "*mcp*test*" 2>/dev/nullRepository: cuga-project/cuga-agent
Length of output: 49
🏁 Script executed:
# Check for any existing tool wrapping patterns in the codebase
rg "BaseTool\|StructuredTool" --type py -B 2 -A 5 | head -80Repository: cuga-project/cuga-agent
Length of output: 49
🏁 Script executed:
# Let's search for the source code or usage patterns to understand what metadata matters
rg "return_direct\|tags\|callbacks\|metadata" --type py -B 2 -A 2 | head -80Repository: cuga-project/cuga-agent
Length of output: 49
🏁 Script executed:
# Let's check the test file to see if return_direct is actually being used
cat -n src/cuga/backend/cuga_graph/policy/tests/test_e2e_healthcare_family_claims.py | grep -A 10 -B 10 "return_direct"Repository: cuga-project/cuga-agent
Length of output: 1739
🏁 Script executed:
# Let's look at how MCP tools are obtained and what type they are
rg "BaseTool\|MCP" src/cuga/backend/cuga_graph/nodes/chat/chat_agent/ --type py -B 2 -A 2Repository: cuga-project/cuga-agent
Length of output: 49
🏁 Script executed:
# Check if langchain-core 1.2.22+ has return_direct parameter in from_function
# Let's look for test patterns with return_direct in from_function calls
rg "from_function.*return_direct" --type pyRepository: cuga-project/cuga-agent
Length of output: 49
🏁 Script executed:
# Let's check if there's a BaseTool.copy method or similar that preserves metadata
rg "\.copy\(\)\|copy_update" src/cuga/backend/cuga_graph/nodes/chat/chat_agent/mcp_filesystem_wrapper.py --type pyRepository: cuga-project/cuga-agent
Length of output: 49
🏁 Script executed:
# Let's see how the tool parameter is used - what type is it expected to be
cat -n src/cuga/backend/cuga_graph/nodes/chat/chat_agent/mcp_filesystem_wrapper.py | sed -n '77,130p'Repository: cuga-project/cuga-agent
Length of output: 2331
🏁 Script executed:
# Let's check if BaseTool has any relevant attributes for callbacks, return_direct, tags
rg "tool\." src/cuga/backend/cuga_graph/nodes/chat/chat_agent/mcp_filesystem_wrapper.py --type py -B 1 -A 1Repository: cuga-project/cuga-agent
Length of output: 550
🏁 Script executed:
# Let's check the broader context - how is _wrap_filesystem_tool called and what tools are passed to it
rg "_wrap_filesystem_tool\|wrap_filesystem_tool" src/cuga/backend/cuga_graph/nodes/chat/chat_agent/ --type py -B 3 -A 3Repository: cuga-project/cuga-agent
Length of output: 49
🏁 Script executed:
# Search for calls to _wrap_filesystem_tool to see what tools are passed
rg "_wrap_filesystem_tool" --type py -B 5 -A 5Repository: cuga-project/cuga-agent
Length of output: 2343
🏁 Script executed:
# Let's check the entire mcp_filesystem_wrapper file to understand the full context
wc -l src/cuga/backend/cuga_graph/nodes/chat/chat_agent/mcp_filesystem_wrapper.pyRepository: cuga-project/cuga-agent
Length of output: 147
🏁 Script executed:
# Print the full file
cat -n src/cuga/backend/cuga_graph/nodes/chat/chat_agent/mcp_filesystem_wrapper.pyRepository: cuga-project/cuga-agent
Length of output: 8777
🏁 Script executed:
# Let's check if there are any MCP tools being used and if they have return_direct, tags, callbacks set
rg "langchain.mcp\|langchain-mcp" --type py | head -20Repository: cuga-project/cuga-agent
Length of output: 49
🏁 Script executed:
# Let's search for imports or usage of MCP adapters
rg "langchain_mcp_adapters" --type py -B 2 -A 2Repository: cuga-project/cuga-agent
Length of output: 584
🏁 Script executed:
# Check BaseTool attributes to see what fields might be important
rg "return_direct\|\.tags\|\.callbacks\|\.metadata" --type py -B 2 -A 2 | head -100Repository: cuga-project/cuga-agent
Length of output: 49
🏁 Script executed:
# Search for what StructuredTool.from_function actually accepts as parameters
rg "from_function" src/cuga/backend/cuga_graph/nodes/chat/chat_agent/mcp_filesystem_wrapper.py -B 10 -A 5Repository: cuga-project/cuga-agent
Length of output: 690
🏁 Script executed:
# Let's check if BaseTool has a copy method that might be better for preserving metadata
python3 << 'EOF'
import inspect
try:
from langchain_core.tools import BaseTool, StructuredTool
# Check what parameters from_function accepts
sig = inspect.signature(StructuredTool.from_function)
print("StructuredTool.from_function signature:")
print(sig)
print("\nParameters:")
for param_name, param in sig.parameters.items():
print(f" {param_name}: {param.annotation if param.annotation != inspect.Parameter.empty else 'no annotation'}")
# Check if BaseTool has a copy method
if hasattr(BaseTool, 'copy'):
print("\nBaseTool has a 'copy' method")
copy_sig = inspect.signature(BaseTool.copy)
print(f"copy signature: {copy_sig}")
# Check BaseTool attributes
print("\nBaseTool common attributes:")
for attr in ['return_direct', 'tags', 'callbacks', 'metadata']:
if hasattr(BaseTool, attr):
print(f" {attr}: exists")
except Exception as e:
print(f"Error: {e}")
import traceback
traceback.print_exc()
EOFRepository: cuga-project/cuga-agent
Length of output: 252
🌐 Web query:
langchain-core 1.2.22 StructuredTool.from_function parameters documentation
💡 Result:
In langchain-core, StructuredTool.from_function accepts these parameters (per the official LangChain reference docs): - func: Callable | None (default: None) The function from which to create a tool. - coroutine: Callable[..., Awaitable[Any]] | None (default: None) The async function from which to create a tool. - name: str | None (default: None) The name of the tool (defaults to the function name). - description: str | None (default: None) The description of the tool (defaults to the function docstring). - return_direct: bool (default: False) Whether to return the result directly or as a callback. - args_schema: ArgsSchema | None (default: None) The schema of the tool’s input arguments. - infer_schema: bool (default: True) Whether to infer the schema from the function’s signature. - response_format: Literal['content', 'content_and_artifact'] (default: 'content') Controls how the tool output is interpreted: - 'content': output is interpreted as the contents of a ToolMessage - 'content_and_artifact': output is expected to be a two-tuple (content, artifact) for a ToolMessage. - parse_docstring: bool (default: False) If infer_schema and parse_docstring are set, attempt to parse parameter descriptions from Google Style function docstrings. - error_on_invalid_docstring: bool (default: False) If parse_docstring is provided, configure whether to raise ValueError on invalid Google Style docstrings. - kwargs: Any (default: {}) Additional arguments to pass through. These are documented in the StructuredTool.from_function reference page for langchain_core. [1] Top parameters you’ll most often care about: - func: provide your callable [1] - name / description: optional overrides [1] - args_schema / infer_schema: input schema behavior [1] - return_direct: output handling [1] - response_format: whether you return only content or (content, artifact) [1]
Citations:
- 1: https://reference.langchain.com/python/langchain-core/tools/structured/StructuredTool/from_function
🌐 Web query:
langchain BaseTool return_direct tags callbacks metadata fields
💡 Result:
In LangChain (langchain_core.tools.BaseTool), the relevant fields are: 1) return_direct Whether the tool’s output should be returned directly. When set to True, the AgentExecutor stops looping after the tool is called. [1] 2) tags BaseTool has an optional tags attribute (list[str] | None). These tags are associated with each tool call and are passed as arguments to the callback handlers configured for the run. [2] 3) callbacks BaseTool exposes a callbacks property (and also allows passing callbacks into run/arun/invoke). This is where you provide callback handlers that get called during tool execution. [2] 4) metadata BaseTool has an optional metadata attribute (dict[str, Any] | None). This metadata is associated with each tool call and passed as arguments to the handlers defined in callbacks. It can be used to identify a specific instance of a tool / use case. [3] Notes / caveats - Tool call identifiers: there is an open/archived issue noting that tool_call_id may not be forwarded to callbacks in BaseTool.run for on_tool_start/on_tool_end (depending on LangChain version), because tool_call_id is extracted but not passed in the kwargs to callback_manager.on_tool_start in BaseTool.run. [4] If you want, tell me your LangChain version (and whether you’re using invoke/run/arun or AgentExecutor). I can map exactly how tags/metadata/callbacks appear in emitted callback events for that version.
Citations:
- 1: https://reference.langchain.com/python/langchain-core/tools/base/BaseTool/return_direct
- 2: https://reference.langchain.com/python/langchain-core/tools/base/BaseTool
- 3: https://reference.langchain.com/python/langchain-core/tools/base/BaseTool/metadata
- 4: tool_call_id not passed to on_tool_start callback in BaseTool.run() langchain-ai/langchain#34168
🏁 Script executed:
# Let's check if the MCP tools from langchain_mcp_adapters are likely to have these fields set
rg "load_mcp_tools" src/cuga/backend/cuga_graph/nodes/chat/chat_agent/ -B 5 -A 10Repository: cuga-project/cuga-agent
Length of output: 3709
🏁 Script executed:
# Check if there are any tests that verify MCP tool wrapping behavior
fd -name "*mcp*test*" -o -name "*test*mcp*" 2>/dev/null | xargs ls -la 2>/dev/nullRepository: cuga-project/cuga-agent
Length of output: 2440
🏁 Script executed:
# Look for any evidence of return_direct being used in the MCP context
rg "return_direct" src/cuga/backend/cuga_graph/nodes/chat/chat_agent/ --type py -B 3 -A 3Repository: cuga-project/cuga-agent
Length of output: 49
🏁 Script executed:
# Check if there's a copy_update method or similar on BaseTool
rg "copy\|model_copy" src/cuga/backend/cuga_graph/nodes/chat/chat_agent/mcp_filesystem_wrapper.py --type pyRepository: cuga-project/cuga-agent
Length of output: 49
🏁 Script executed:
# Search for how tool copying is done in the codebase
rg "tool\.copy\|model_copy\|\.model_copy" src/cuga/backend/cuga_graph/ --type py -B 2 -A 2Repository: cuga-project/cuga-agent
Length of output: 49
🏁 Script executed:
# Check if there are other examples of tool wrapping in the codebase to see best practices
rg "BaseTool\|StructuredTool" src/cuga/backend/cuga_graph/nodes/chat/chat_agent/ --type py -B 5 -A 5 | head -100Repository: cuga-project/cuga-agent
Length of output: 49
Preserve return_direct and other BaseTool metadata when wrapping filesystem tools.
StructuredTool.from_function(...) reconstructs a fresh tool object, losing return_direct, tags, callbacks, and metadata from the original MCP tool. This causes wrapped tools to behave differently from the unwrapped ones—particularly if return_direct=True is set. At minimum, copy return_direct from the original tool:
Example fix
wrapped = StructuredTool.from_function(
func=_invoke,
coroutine=_ainvoke,
name=tool.name,
description=tool.description or "",
args_schema=getattr(tool, "args_schema", None),
return_direct=getattr(tool, "return_direct", False),
)Alternatively, check BaseTool.copy() or model_copy() in langchain-core 1.2.22 to wrap invoke/ainvoke without reconstructing the tool.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/cuga/backend/cuga_graph/nodes/chat/chat_agent/mcp_filesystem_wrapper.py`
around lines 88 - 94, The wrapped StructuredTool created in
StructuredTool.from_function loses BaseTool metadata (return_direct, tags,
callbacks, metadata) from the original MCP tool, causing behavioral changes;
update the wrapping in mcp_filesystem_wrapper by preserving those fields—pass
return_direct=getattr(tool, "return_direct", False) (and tags/callbacks/metadata
as appropriate) into StructuredTool.from_function or instead use the original
tool's copy/model_copy (e.g., BaseTool.copy() / model_copy()) and replace only
the invoke/ainvoke implementations (_invoke/_ainvoke) so
name/description/args_schema and all metadata stay intact.
| # Nothing leaked into /tmp/cuga (the legacy location) | ||
| assert not (Path("/tmp") / "cuga").exists() or not any(Path("/tmp/cuga").iterdir()), ( | ||
| "no files should be created under the legacy /tmp/cuga path" | ||
| ) |
There was a problem hiding this comment.
This legacy-path assertion is flaky against a dirty /tmp.
It fails whenever another test or a developer machine already has files under /tmp/cuga, even if this test never touched that path. Snapshot the directory before the command and assert it did not gain this test's output, rather than requiring the whole tree to be absent or empty.
🧰 Tools
🪛 Ruff (0.15.12)
[error] 92-92: Probable insecure usage of temporary file or directory: "/tmp"
(S108)
[error] 92-92: Probable insecure usage of temporary file or directory: "/tmp/cuga"
(S108)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/tests/test_local_sandbox_skills.py`
around lines 91 - 94, The flaky assertion that requires /tmp/cuga to be empty
should be replaced with a snapshot comparison: before running the code under
test (in test_local_sandbox_skills), record the set of paths under
Path("/tmp/cuga") (empty set if the directory doesn't exist), run the operation,
then capture the set of paths again and assert that the after-set is exactly the
before-set (or that no new paths exist by asserting
after_set.issubset(before_set) is False), i.e. verify the test did not add any
new files; update the assertion that currently references Path("/tmp") / "cuga"
to compare these two sets instead.
…rfacing Replace the ad-hoc MCP filesystem demo and chat wrapper with a shared workspace filesystem layer under cuga_lite (workspace_fs, paths, backends) and route local, native, and opensandbox executors through it. Demo and manager flows no longer rely on the removed file_system MCP package; managed MCP and CLI presets are updated accordingly. Saving enable_filesystem_tools only under advanced_features now correctly hydrates feature_flags when loading manage config, so the UI matches demo defaults. Manager integration tests point the MCP filesystem server at ./cuga_workspace to match the seeded workspace layout. Includes frontend Manage/ToolsConfig updates and a rebuilt demo bundle.
Feature
Related issues
native/localmode. This PR introduces thesandbox_workspaceplumbing and the workspace-aware prompt rendering. The host-processchdirportion is still being evaluated against CI evidence (MCP filesystem allowed-dirs interaction) and may be tightened in a follow-up before final close-out.Summary
Two coherent themes land together in this PR:
1. Sandbox-aware execution
The executor now matches its working directory and prompt guidance to the active
sandbox_workspace, eliminating the recurring "is the CWD.or/workspace?" confusion.code_executorchdirs Python execution into the per-thread workspace for local/native sandbox modes so generated code shares a CWD withrun_command.code_wrapperthreadsworkspace_rootthroughwrap_codeso the prependedchdirlives inside the wrappedasync main.cuga_lite_graphpassessandbox_workspaceinto the reflection prompt context so strategic guidance matches the executor's actual CWD.reflection_system,reflection_user, andmcp_promptJinja templates replace "explain every case" guidance with a single branch that renders concrete path rules for the actualsandbox_workspacevalue (.,/workspace, or other). One workspace → one set of rules.2. Carbon side panel refresh
ChatLanding's right-side agent panel now uses idiomatic Carbon primitives:Accordion+AccordionItemfor the app configuration section, withContainedListfor the per-app tool lists (replaces a custom expand/collapse implementation).Layerwrapping the scrollable content area for proper Carbon elevation.Tilefor skill cards with truncated descriptions and a Show more / Show less expander (descriptions over 100 chars).border-bottomor pseudo-elements.Plus an incidental fix in
customSendMessage.ts: thethinkingIntervalvariable was scoped insidetry, but cleared insidefinally— now lifted to function scope. The half-finished thinking-indicator feature has been removed and will return in a follow-up PR with the streaming-display work.Build / infra: postinstall patch for express's
path-to-regexpv8 import shape, pinned webpack versions, refreshedpnpm-lock.yaml, regenerateddist/bundle, and.claude/settings.local.jsonadded to.gitignore.Testing
pnpm dev:realSummary by CodeRabbit
New Features
UI
Bug Fixes
Infrastructure