Skip to content

[hackathon]: cuga loops (work in progress PR)#198

Draft
anupamamurthi wants to merge 30 commits into
cuga-project:mainfrom
anupamamurthi:feat/cuga-loops
Draft

[hackathon]: cuga loops (work in progress PR)#198
anupamamurthi wants to merge 30 commits into
cuga-project:mainfrom
anupamamurthi:feat/cuga-loops

Conversation

@anupamamurthi
Copy link
Copy Markdown

@anupamamurthi anupamamurthi commented May 7, 2026

A new self-scheduling primitive that lets CUGA agents schedule themselves to wake up later

Cuga-App that uses Cuga Loops ->https://github.com/cuga-project/cuga-apps/pull/1/changes

Summary by CodeRabbit

  • New Features

    • Added agent skills system for reusable execution workflows with automatic dependency management.
    • Added loop scheduling capabilities for recurring and one-shot background tasks with a management UI.
    • Added OpenSandbox as an alternative code execution backend.
    • Thread-aware workspace operations for multi-session file management.
  • Documentation

    • Substantially updated README with new features overview, advanced configuration guidance, and agent skills documentation.
  • Configuration

    • Added settings for skills enablement, loop scheduling, and OpenSandbox integration.

sami-marreed and others added 30 commits April 4, 2026 01:05
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 7, 2026

📝 Walkthrough

Walkthrough

Adds skills and loops systems, OpenSandbox executor/tools with sandbox-backed workspace APIs, extended CugaLite binding/prompts, server endpoints, CLI demo_skills, frontend thread-aware workspace and skills UI, config/docs updates, and tests.

Changes

Unified Skills, Loops, Sandbox, and Threaded Workspace

Layer / File(s) Summary
Docs, config, and LLM model flags
.gitignore, .secrets.baseline, README.md, pyproject.toml, src/cuga/config.py, src/cuga/settings.toml, src/cuga/backend/llm/models.py
Documentation reflow with skills; ignore/baseline tweaks; adds opensandbox extra; new validators/settings; expands reasoning-model prefixes.
Skills subsystem (loader, registry, tools, exports)
src/cuga/backend/skills/*
Adds SKILL.md discovery, registry, load_skill tool, and exports.
OpenSandbox executor and sandbox-backed workspace
src/cuga/backend/.../opensandbox/*, .../code_executor.py, .../call_api_helper.py, .../variable_utils.py, src/cuga/backend/server/workspace_sandbox.py, src/cuga/backend/server/main.py
Adds OpenSandbox executor/tools; switches remote call helper; variable filtering; implements sandbox-backed workspace tree/file/download with thread scoping and skills listing.
CugaLite bind-tools, todos, and prompts
src/cuga/backend/cuga_graph/nodes/cuga_lite/*
Adds bind-tools by tool names and overlays, shared todos store, skills/shell prompt wiring, and reflection/template changes.
Loops models/service/API/tools and SDK/supervisor wiring
src/cuga/backend/loops/*, src/cuga/backend/cuga_supervisor/*, src/cuga/sdk.py, src/cuga/backend/server/main.py
Introduces APScheduler-based loops with SQLite registry, FastAPI router + UI, tools, and SDK/supervisor integration; server mounts/shutdown handling.
Frontend thread-scoped workspace and skills UI
src/frontend_workspaces/..., src/frontend_workspaces/frontend/...
Initial thread id; thread-aware workspace API usage; ManagePage skills panel; styles; API functions accept optional threadId.
CLI demo_skills and manage preset
src/cuga/cli/main.py, src/cuga/backend/server/demo_manage_setup.py
Adds demo_skills mode with OpenSandbox checks and skills starters; manage preset updates.
Tests
tests/unit/*, src/cuga/sdk_core/tests/*
New tests for bind-tools, skills, sandbox mapping; SDK summarization assertions adjusted.

Sequence Diagram(s)

sequenceDiagram
  participant Frontend
  participant Server
  participant Skills as SkillsRegistry
  participant Sandbox
  participant Loops as LoopsService
  Frontend->>Server: GET /api/agent/context, /api/skills
  Server->>Skills: discover_skills()
  Skills-->>Server: skills list
  Frontend->>Server: GET /api/workspace/* (thread_id)
  Server->>Sandbox: list/read/write via tools
  Sandbox-->>Server: tree/bytes
  Frontend->>Server: POST /cuga/loops/api/create
  Server->>Loops: register_loop()
  Loops-->>Server: scheduled fire invokes agent
Loading

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • sami-marreed
  • gjt-prog

Poem

A rabbit packs its tidy kit,
Skills to load and loops to fit;
Threads weave trees in sandbox snow,
Tools bound neatly, prompts aglow.
Tap-tap hops—deploy, commit—
CUGA hums: it’s shipping lit. 🐇✨

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 20

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
src/frontend_workspaces/agentic_chat/src/workspaceService.ts (2)

122-130: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Returning cached data on error can serve another thread's tree.

In fetchWorkspaceData's catch (Line 126), this.cachedData may have been written by a previous request for a different thread that interleaved with the thread reset at Lines 67-73. Once the keying fix above is in, also gate this fallback on cachedForThreadId === (threadId ?? '') before returning.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/frontend_workspaces/agentic_chat/src/workspaceService.ts` around lines
122 - 130, In fetchWorkspaceData's catch block, avoid returning cachedData
unguarded because it may belong to another thread; instead, only fall back to
this.cachedData when this.cachedForThreadId === (threadId ?? '') so the cached
entry matches the requested threadId (refer to fetchWorkspaceData,
this.cachedData, this.cachedForThreadId, and threadId). If the guard fails,
return the empty tree { tree: [], timestamp: Date.now() } as currently
implemented.

67-106: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Race: a threadId change while a request is in flight returns and caches the wrong thread's data.

The order of operations bites here:

  1. Caller A invokes getWorkspaceTree(false, "A") → resets cache, cachedForThreadId = "A", pendingRequest = fetchWorkspaceData("A"), awaits.
  2. Caller B invokes getWorkspaceTree(false, "B") before A resolves → block at Lines 67-73 resets cachedData = null, sets cachedForThreadId = "B", lastFetchTime = 0. Then Lines 84-86 short-circuit: return this.pendingRequest — but pendingRequest is still A's fetch.
  3. When A's fetch resolves, the try block in A's call (Lines 97-105) writes this.cachedData = data and this.lastFetchTime = Date.now(). cachedForThreadId is still "B", so A's data is now cached as if it belonged to thread B. Caller B also resolves with A's data.

Two consequences: B receives the wrong thread's tree, and subsequent reads for thread B within MIN_INTERVAL_MS will keep serving A's tree from cache.

🛠️ Suggested fix — key the pending request by threadId, drop short-circuit on mismatch
 class WorkspaceService {
   private static instance: WorkspaceService;
   private lastFetchTime: number = 0;
   private cachedData: WorkspaceData | null = null;
   private cachedForThreadId: string | undefined = undefined;
-  private pendingRequest: Promise<WorkspaceData> | null = null;
+  private pendingRequest: Promise<WorkspaceData> | null = null;
+  private pendingForThreadId: string | undefined = undefined;
@@
   async getWorkspaceTree(forceRefresh: boolean = false, threadId?: string): Promise<WorkspaceData> {
     const tidKey = threadId ?? '';
     if (this.cachedForThreadId !== tidKey) {
       this.cachedData = null;
       this.cachedForThreadId = tidKey;
       this.lastFetchTime = 0;
     }
@@
-    // If there's already a pending request, wait for it instead of making a new one
-    if (this.pendingRequest) {
-      return this.pendingRequest;
-    }
+    // Only reuse a pending request if it's for the same thread.
+    if (this.pendingRequest && this.pendingForThreadId === tidKey) {
+      return this.pendingRequest;
+    }
@@
-    this.pendingRequest = this.fetchWorkspaceData(threadId);
+    this.pendingRequest = this.fetchWorkspaceData(threadId);
+    this.pendingForThreadId = tidKey;

     try {
       const data = await this.pendingRequest;
-      this.cachedData = data;
-      this.lastFetchTime = Date.now();
-      this.notifyListeners(data);
+      // Only commit results if the caller's thread is still current.
+      if (this.cachedForThreadId === tidKey) {
+        this.cachedData = data;
+        this.lastFetchTime = Date.now();
+        this.notifyListeners(data);
+      }
       return data;
     } finally {
-      this.pendingRequest = null;
+      if (this.pendingForThreadId === tidKey) {
+        this.pendingRequest = null;
+        this.pendingForThreadId = undefined;
+      }
     }
   }

Also clear pendingForThreadId in clearCache().

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/frontend_workspaces/agentic_chat/src/workspaceService.ts` around lines 67
- 106, getWorkspaceTree races when threadId changes because a single shared
pendingRequest and cachedForThreadId allow an in-flight fetch for thread A to be
returned and cached for thread B; change the logic to key pending requests by
thread id (e.g., pendingRequests: Map<string, Promise<WorkspaceData>>) and only
return or await a pending request when its key matches tidKey, ensure when
resolving a fetch from fetchWorkspaceData(threadId) you verify the resolved data
is being stored only if tidKey still matches cachedForThreadId (or store cache
per-thread), and update clearCache to clear the pending request entry for the
affected thread (clear any pendingForThreadId/pendingRequests entries) so we
never cache or return a different thread's data.
src/frontend_workspaces/agentic_chat/src/FileAutocomplete.tsx (1)

380-392: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Mock-fallback is one-shot; thread switches that fail won't retry real data.

usedMockRef.current latches true after the first error and prevents useMockData() from running again, but the real fetch also won't retry beyond the existing 15s interval. If the very first getWorkspaceTree call for a new threadId errors (e.g. backend not yet ready for a freshly created sandbox), the user sees stale mock entries until the interval ticks. Consider clearing usedMockRef.current whenever threadId changes (or whenever the next interval succeeds) so the UI escapes the mock state.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/frontend_workspaces/agentic_chat/src/FileAutocomplete.tsx` around lines
380 - 392, The mock-fallback latches via usedMockRef.current in
loadWorkspaceFiles, causing mock data to persist across threadId changes; update
the logic so that when a real fetch succeeds you clear the latch (set
usedMockRef.current = false) and also add a threadId change handler (e.g., in
the effect that triggers loadWorkspaceFiles) to reset usedMockRef.current when
threadId changes so a fresh getWorkspaceTree attempt will not be blocked from
using real data; reference functions/variables: loadWorkspaceFiles, usedMockRef,
useMockData, getWorkspaceTree, threadId, and setAllFiles.
src/frontend_workspaces/agentic_chat/src/CustomChat.tsx (1)

228-244: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Ignore stale workspace refreshes after a thread switch.

A slower getWorkspaceTree(..., tid) from the previous thread can still resolve after threadId changes and overwrite allFiles with the wrong workspace. That makes autocomplete/file hover point at another thread’s files until the next refresh.

Suggested fix
 useEffect(() => {
+  let cancelled = false;
   const loadFiles = async () => {
+    const tid = (threadIdRef.current || threadId || "").trim() || undefined;
     try {
       const { workspaceService } = await import('./workspaceService');
-      const tid = (threadIdRef.current || threadId || "").trim() || undefined;
       const data = await workspaceService.getWorkspaceTree(false, tid);
       const files = extractFiles(data.tree || []);
+      const currentTid = (threadIdRef.current || "").trim() || undefined;
+      if (cancelled || tid !== currentTid) return;
       setAllFiles(files);
     } catch (error) {
       console.error('Error loading files:', error);
     }
   };

   loadFiles();
   const interval = setInterval(loadFiles, 15000);
-  return () => clearInterval(interval);
+  return () => {
+    cancelled = true;
+    clearInterval(interval);
+  };
 }, [threadId]);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/frontend_workspaces/agentic_chat/src/CustomChat.tsx` around lines 228 -
244, The async loader (loadFiles) can finish after threadId changes and
overwrite allFiles with stale data; capture the request's thread id up-front
(e.g., const requestedTid = (threadIdRef.current || threadId || "").trim() ||
undefined) and after awaiting workspaceService.getWorkspaceTree(...) verify the
current thread id still matches requestedTid (or check
threadIdRef.current/threadId) before calling setAllFiles; alternatively add a
per-load abort token/counter checked before setAllFiles to ignore results from
previous loads (refer to loadFiles, threadIdRef, threadId, getWorkspaceTree, and
setAllFiles).
README.md (1)

1082-1083: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Broken markdown link: text and URL appear swapped.

In [./docs/examples/cuga_with_runtime_tools/README.md](Adding Tools), the link target is Adding Tools (becomes Adding%20Tools) and the visible text is the path. The next bullet has the same path-as-text shape, just consistent. Restore [text](url) ordering for the runtime-tools example.

📝 Proposed fix
-- **Comprehensive example with different tools + MCP**: [./docs/examples/cuga_with_runtime_tools/README.md](Adding Tools)
-- **CUGA as MCP**: [./docs/examples/cuga_as_mcp/README.md](docs/examples/cuga_as_mcp)
+- **Comprehensive example with different tools + MCP**: [Adding Tools](./docs/examples/cuga_with_runtime_tools/README.md)
+- **CUGA as MCP**: [docs/examples/cuga_as_mcp](./docs/examples/cuga_as_mcp/README.md)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@README.md` around lines 1082 - 1083, The markdown link in README.md has the
link text and URL swapped for the runtime-tools example; update the bullet to
use the correct [text](url) ordering by making the visible text "Adding Tools"
and the href "./docs/examples/cuga_with_runtime_tools/README.md" (i.e., change
`[./docs/examples/cuga_with_runtime_tools/README.md](Adding Tools)` to `Adding
Tools` as the link text pointing to that path) and ensure the URL remains the
file path (no percent-encoding).
🧹 Nitpick comments (15)
src/cuga/backend/llm/models.py (1)

566-567: 💤 Low value

'gpt-5.5' prefix is redundant — already covered by 'gpt-5'.

str.startswith checks each prefix in order and short-circuits on the first match. Any model name beginning with gpt-5.5 already starts with gpt-5, so the 'gpt-5.5' entry can never be the distinguishing match and is dead code. The 'azure/gpt-5.5' entry is correct and necessary (for litellm routes).

🧹 Proposed cleanup
-        reasoning_prefixes = ('o1', 'o3', 'gpt-5', 'gpt-5.5', 'azure/gpt-5.5')
+        reasoning_prefixes = ('o1', 'o3', 'gpt-5', 'azure/gpt-5.5')
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cuga/backend/llm/models.py` around lines 566 - 567, The tuple
reasoning_prefixes in the function that returns
model_name.startswith(reasoning_prefixes) contains a redundant entry 'gpt-5.5'
because any name starting with 'gpt-5.5' already starts with 'gpt-5'; remove
'gpt-5.5' from the reasoning_prefixes tuple and keep 'gpt-5' and 'azure/gpt-5.5'
as needed so the startswith check remains correct and non-redundant.
src/cuga/sdk_core/tests/test_context_summarization_sdk.py (1)

373-378: ⚡ Quick win

// 3 cap may be too permissive to reliably detect missed summarization

With KEEP_LAST_N_MESSAGES = 5 and a working summarizer, the checkpoint should contain roughly ~6 messages (1 summary + 5 recent). The cap max(35, len(history) // 3) permits up to 46 messages for the 138-message large history — roughly 7-8× the expected post-summarization count. This means a severely degraded summarizer (e.g., one that compresses only partially) could still pass the assertion, weakening the regression signal.

A tighter bound — say max_expected_after = max(keep_last_n + 10, len(history) // 10) — would catch partial-compression failures while still giving headroom for tool-call messages added during the invoke. If the intent is only to rule out "no compression at all", the current formula is fine, but a comment noting that would clarify the design choice.

💡 Optional tighter cap example
-            max_expected_after = max(35, len(large_history) // 3)
+            # keep_last_n=5, +1 summary, +some tool messages per invoke turn
+            keep_last_n = int(os.environ.get("DYNACONF_CONTEXT_SUMMARIZATION__KEEP_LAST_N_MESSAGES", "5"))
+            max_expected_after = max(keep_last_n + 20, len(large_history) // 10)
             assert message_count_before < max_expected_after, (

And similarly at the second changed site (line 820):

-            max_expected_after = max(35, len(all_messages) // 3)
+            keep_last_n = int(os.environ.get("DYNACONF_CONTEXT_SUMMARIZATION__KEEP_LAST_N_MESSAGES", "5"))
+            max_expected_after = max(keep_last_n + 20, len(all_messages) // 10)

Also applies to: 820-825

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cuga/sdk_core/tests/test_context_summarization_sdk.py` around lines 373 -
378, The assertion uses a permissive cap max(35, len(large_history) // 3) which
allows far too many messages post-summarization; update the cap calculation in
the test (both the occurrence around message_count_before and the similar check
near lines 820-825) to a tighter bound such as max(KEEP_LAST_N_MESSAGES + 10,
len(large_history) // 10) (or max(keep_last_n + 10, len(history)//10) if
different variable names) so the test will fail for partial summarizers, and add
a short inline comment referencing KEEP_LAST_N_MESSAGES to explain the headroom
for tool-call messages; alternatively, if the intent is only to detect "no
compression", add a comment clarifying that and leave the original formula.
pyproject.toml (1)

68-68: ⚡ Quick win

No version pins on opensandbox / opensandbox-code-interpreter.

Unpinned extras make the build non-reproducible and may silently pull in a breaking upstream release. At minimum, add a floor version once the compatible version is known.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pyproject.toml` at line 68, The dependency extras "opensandbox" and
"opensandbox-code-interpreter" in pyproject.toml are unpinned and make builds
non-reproducible; update the opensandbox extras list to include explicit version
constraints (at minimum a floor, e.g., "opensandbox>=X.Y.Z" and
"opensandbox-code-interpreter>=A.B.C") or pinned versions once compatibility is
determined, replacing the current unversioned entries so the project uses a
stable, reproducible release.
src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/common/variable_utils.py (1)

24-30: 💤 Low value

strip_tools_output_var uses a plain substring match for find_tools.

"find_tools" in code will match occurrences inside comments, string literals, or variable names (e.g. # call find_tools later), causing tools_output to be incorrectly stripped in those cases. For this narrowly-scoped internal helper the risk is low, but a call-site pattern such as find_tools( would reduce false positives with minimal effort.

🛠️ Suggested tightening
-    if "find_tools" not in code:
+    if "find_tools(" not in code:
         return new_vars
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/common/variable_utils.py`
around lines 24 - 30, The current strip_tools_output_var uses a substring check
("find_tools" in code) that can match comments or strings; change the detection
in strip_tools_output_var to look specifically for a call-site pattern (e.g. use
a regex like r'\bfind_tools\s*\(') so it only strips tools_output when
find_tools is actually invoked; update the function to import and use
re.search(r'\bfind_tools\s*\(', code) instead of the plain substring check to
reduce false positives.
src/cuga/backend/skills/tools.py (1)

15-29: 💤 Low value

Consider an async load_skill_impl if the registry ever does I/O.

Today registry.load_skill(name) is in-memory, so a sync function is fine. If the registry later reads/refreshes from disk on miss, switching load_skill_impl to async def (and using StructuredTool.from_function(coroutine=...)) avoids blocking the agent's event loop. Not blocking right now — flagging for when the registry grows.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cuga/backend/skills/tools.py` around lines 15 - 29, The current
load_skill_impl inside create_skill_tools calls registry.load_skill
synchronously which will block if registry.load_skill later performs I/O; change
load_skill_impl to an async function (async def load_skill_impl(name: str) ->
str) and switch the StructuredTool creation to pass the coroutine variant (use
StructuredTool.from_function for coroutine/coroutine=... or the library's
coroutine argument) so StructuredTool wraps an async callable; update any typing
to reflect async return and keep args_schema=LoadSkillInput intact.
src/frontend_workspaces/frontend/src/ManagePage.tsx (1)

1561-1572: 💤 Low value

Use an if/else instead of a ternary for the expand toggle.

isExpanded ? next.delete(skill.name) : next.add(skill.name); evaluates a ternary purely for side effects; some linters (no-unused-expressions) will flag it, and it's harder to read than the equivalent control-flow form.

♻️ Suggested change
-                                onClick={() => setExpandedSkills((prev) => {
-                                  const next = new Set(prev);
-                                  isExpanded ? next.delete(skill.name) : next.add(skill.name);
-                                  return next;
-                                })}
+                                onClick={() => setExpandedSkills((prev) => {
+                                  const next = new Set(prev);
+                                  if (isExpanded) {
+                                    next.delete(skill.name);
+                                  } else {
+                                    next.add(skill.name);
+                                  }
+                                  return next;
+                                })}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/frontend_workspaces/frontend/src/ManagePage.tsx` around lines 1561 -
1572, The onClick handler for the expand toggle uses a ternary purely for side
effects (isExpanded ? next.delete(skill.name) : next.add(skill.name)) which can
trigger linter errors; in the setExpandedSkills updater inside the button
onClick, replace the ternary with an explicit if/else: create next as new
Set(prev), then if (isExpanded) { next.delete(skill.name); } else {
next.add(skill.name); } and finally return next. This change should be made in
the inline handler passed to setExpandedSkills (the function referencing
isExpanded and skill.name).
src/frontend_workspaces/agentic_chat/src/FileAutocomplete.tsx (1)

175-309: 🏗️ Heavy lift

Split the workspace-fetch effect from the Carbon listener setup; the latter shouldn't re-run on threadId.

Adding threadId to the dep array (Line 309) is correct for the polling-fetch part, but it now also tears down and rebuilds all the Carbon-chat / textarea listener wiring every time threadId changes. A few concrete issues that surface as a result:

  • The retry path setTimeout(setupCarbonListeners, 500) (Line 300) and the deferred attachment loop setTimeout(() => { textareas.forEach(...) }, 1000) (Lines 259-272) don't capture timer IDs and aren't cleared on cleanup.
  • Listeners attached inside the 1s setTimeout to textareas (with data-autocomplete-listener) are never removed; the attribute prevents duplicate attachment to the same element, but each effect run creates a fresh closure that can attach to newly discovered nodes, and unmount/threadId change doesn't detach any of them.
  • The cleanup returned at Lines 289-298 only runs from the synchronous branch, so a threadId change during the retry-loop period skips cleanup entirely.

Recommend extracting two effects: one keyed on [disabled, threadId] that just runs loadWorkspaceFiles + setInterval, and a second keyed on [disabled] for the listener wiring. Track every setTimeout/setInterval ID and addEventListener call so the cleanup symmetrically removes them.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/frontend_workspaces/agentic_chat/src/FileAutocomplete.tsx` around lines
175 - 309, Split the big useEffect into two effects: one (deps [disabled,
threadId]) that only runs loadWorkspaceFiles and manages fileInterval (capture
the interval ID and clear it on cleanup), and a second effect (deps [disabled])
that encapsulates setupCarbonListeners and all DOM wiring; inside that listener
effect capture and clear any setTimeout IDs, store references to added listeners
(e.g., handleCarbonInput, handleDocumentInput) and to elements you attach
per-element listeners to (those marked with data-autocomplete-listener) so you
can removeEventListener for each on cleanup, and ensure the retry setTimeout
created by setupCarbonListeners is cleared if the effect unmounts; update
references to loadWorkspaceFiles, setupCarbonListeners, handleCarbonInput,
handleDocumentInput and fileInterval in the respective effects.
src/frontend_workspaces/frontend/src/ChatLanding.tsx (1)

681-685: 💤 Low value

2.5s polling interval is below the workspace service's 3s throttle.

workspaceService.getWorkspaceTree enforces a 3000ms minimum between fetches; a 2500ms interval here means every other tick is served from cache, but it also means cache-eviction edges (e.g. just after a thread reset) can stall the next request by up to ~500ms while the throttle drains. If the intent is "refresh as fast as the throttle allows", align the interval (e.g. 3000ms) or pass forceRefresh: true and rely on the service's internal wait.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/frontend_workspaces/frontend/src/ChatLanding.tsx` around lines 681 - 685,
The polling interval in ChatLanding.tsx's useEffect currently calls
fetchWorkspaceTree every 2500ms which is below
workspaceService.getWorkspaceTree's 3000ms throttle; update the effect to either
raise the interval to 3000ms (setInterval(fetchWorkspaceTree, 3000)) or modify
fetchWorkspaceTree to call workspaceService.getWorkspaceTree({ forceRefresh:
true }) so requests are not stalled by the service throttle; ensure you only
change the interval or add the forceRefresh flag and keep the existing
clearInterval cleanup.
src/cuga/backend/loops/runner.py (2)

64-85: ⚡ Quick win

Post-run bookkeeping is one fragile chain — a registry hiccup leaves loops in inconsistent states.

record_fire, insert_run, then update_status/update_next_fire run as one sequential happy path. If any of them raise (transient DB blip, serialization error on the run summary, etc.), one-shot DELAY loops won't be marked CANCELLED and recurring loops won't refresh next_fire_at, so the UI keeps showing the previous values until the next fire. Consider wrapping each step in its own try/except (logging on failure) so a write failure on the run record doesn't strand the loop's scheduling state, and vice versa.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cuga/backend/loops/runner.py` around lines 64 - 85, The post-run registry
calls (svc.registry.record_fire, svc.registry.insert_run,
svc.registry.update_status, svc.registry.update_next_fire) are executed as one
sequential happy path and a failure in any call can leave loop state
inconsistent; change the code so each registry operation is wrapped in its own
try/except that logs the exception (including loop_id and error) and continues,
and ensure the scheduling updates (calling svc._next_fire_ts and then
svc.registry.update_next_fire or update_status for TriggerKind.DELAY) run
regardless of failures in record_fire/insert_run so one-shot DELAY loops are
always marked CANCELLED and recurring loops always have next_fire_at refreshed
even if run-record writes fail.

80-85: 💤 Low value

Reaching into svc._next_fire_ts from the runner couples to a private API.

_next_fire_ts is a name-prefix-private method; calling it from runner.py is the same scope but readers of either file have to remember the contract. Promote it to a public helper (e.g. LoopsService.next_fire_ts(loop_id)) so the dependency is documented and won't quietly break if the underscore version is renamed during a refactor.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cuga/backend/loops/runner.py` around lines 80 - 85, The runner is calling
the private helper svc._next_fire_ts, coupling to a name-prefixed private API;
add a public method on the LoopsService class (e.g.,
LoopsService.next_fire_ts(loop_id)) that wraps or implements the existing logic
and then update runner.py to call svc.next_fire_ts(loop_id) instead of
svc._next_fire_ts(loop_id); ensure the new method preserves the existing
behavior used by runner (used in the branch handling non-TriggerKind.DELAY where
nxt is set and passed to registry.update_next_fire) and update any imports/tests
that referenced the private name.
src/frontend_workspaces/frontend/src/api.ts (1)

266-285: 💤 Low value

LGTM — new skills and thread-aware workspace endpoints look correct.

One minor style note: getWorkspaceTree (line 271) hand-rolls encodeURIComponent while getWorkspaceFile/getWorkspaceDownload use URLSearchParams. Consider aligning getWorkspaceTree to use URLSearchParams for consistency.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/frontend_workspaces/frontend/src/api.ts` around lines 266 - 285,
getWorkspaceTree currently builds its querystring with encodeURIComponent
manually; update getWorkspaceTree to use URLSearchParams (like getWorkspaceFile
and getWorkspaceDownload) to construct the thread_id parameter for consistency
and correctness—modify the getWorkspaceTree function to create a new
URLSearchParams({ thread_id: threadId }) when threadId is provided and append
its toString() to `/api/workspace/tree` accordingly.
src/cuga/backend/skills/loader.py (1)

104-110: 💤 Low value

set in requirements value gives non-deterministic install ordering.

Line 104 includes set as a supported sequence type for requirements. Sets are unordered, so the resulting tuple[str, ...] can vary across runs. If install order matters (e.g., one package depends on another), this can cause intermittent failures.

🛡️ Proposed fix
-    elif isinstance(value, (list, tuple, set)):
+    elif isinstance(value, (list, tuple)):
         candidates = value
+    elif isinstance(value, set):
+        candidates = sorted(value)  # deterministic ordering
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cuga/backend/skills/loader.py` around lines 104 - 110, The current branch
treats sets the same as lists/tuples causing non-deterministic ordering for
requirements (variable value -> candidates -> return tuple(...)); change the
conditional to handle set separately by converting a set into a
deterministically ordered sequence (e.g., sorted(value)) while keeping
lists/tuples as-is, so replace the isinstance(value, (list, tuple, set))
handling with separate branches: one for list/tuple that assigns candidates =
value and one for set that assigns candidates = sorted(value), then proceed to
the existing tuple(str(item).strip() ...) return to ensure stable, deterministic
install order.
src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/code_executor.py (1)

310-317: ⚡ Quick win

Collapse duplicate if/else branches — both arms are identical.

The _skills_enabled() check guards two byte-identical bodies; the conditional adds noise without changing behavior. Since mode == 'opensandbox' is already remapped to 'local' on line 303-304 above, this block can be flattened.

♻️ Proposed simplification
-        if _skills_enabled():
-            if mode in ('e2b', 'docker'):
-                return await cls._execute_remotely_for_code_agent(wrapped_code, state, mode)
-        else:
-            if mode in ('e2b', 'docker'):
-                return await cls._execute_remotely_for_code_agent(wrapped_code, state, mode)
-        context_locals = cls._prepare_locals_for_code_agent(state)
-        return await cls._execute_locally_for_code_agent(wrapped_code, context_locals)
+        if mode in ('e2b', 'docker'):
+            return await cls._execute_remotely_for_code_agent(wrapped_code, state, mode)
+        context_locals = cls._prepare_locals_for_code_agent(state)
+        return await cls._execute_locally_for_code_agent(wrapped_code, context_locals)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/code_executor.py`
around lines 310 - 317, The duplicate conditional branches around
_skills_enabled() are redundant because both arms call the same remote execution
path for mode in ('e2b', 'docker'); simplify by removing the _skills_enabled()
check and directly check if mode in ('e2b', 'docker') to call
cls._execute_remotely_for_code_agent(wrapped_code, state, mode), otherwise
prepare locals via cls._prepare_locals_for_code_agent(state) and call
cls._execute_locally_for_code_agent(wrapped_code, context_locals); keep
references to wrapped_code, state, mode and the three methods intact.
src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py (1)

1413-1435: ⚡ Quick win

Sandbox tools lack make_tool_awaitable wrapping used for all other tools — add for parity and safety.

All other tools injected into tools_context_dict (find_tools, todos, execution tools, skill tools) are wrapped with make_tool_awaitable() before assignment. The sandbox tools loop extracts the callable and assigns it directly without wrapping. While all current sandbox tools from OpenSandboxExecutor.create_sandbox_tools() are async (all created with coroutine= parameter), this inconsistency creates a maintenance risk: if a future sandbox tool is sync-only, it will fail at runtime when the agent awaits it. Wrap sandbox tools for consistency.

♻️ Suggested fix
                for st in sandbox_tools:
-                    # Prefer coroutine for async execution context
-                    fn = st.coroutine or st.func
+                    fn = getattr(st, "coroutine", None) or getattr(st, "func", None)
                    if fn:
-                        tools_context_dict[st.name] = fn
+                        tools_context_dict[st.name] = make_tool_awaitable(fn)
+                    else:
+                        logger.warning(f"Sandbox tool '{st.name}' has no callable, skipping")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py` around lines
1413 - 1435, The sandbox tool injection loop in cuga_lite_graph currently
assigns st.coroutine or st.func directly into tools_context_dict without using
make_tool_awaitable, creating inconsistency with other tool injections; update
the loop that iterates over sandbox_tools (created via
CodeExecutor._get_opensandbox_executor().create_sandbox_tools) to wrap the
chosen callable (st.coroutine or st.func) with make_tool_awaitable() before
assigning to tools_context_dict[st.name], mirroring how other tools (find_tools,
todos, execution/skill tools) are handled so both async and sync sandbox tools
are safe to await.
src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/opensandbox/opensandbox_executor.py (1)

113-113: ⚡ Quick win

Mutable class-level _sandboxes is shared across all instances

_sandboxes: dict[str, Any] = {} is a class attribute; every OpenSandboxExecutor instance reads and writes the same dict. This silently collapses all per-thread sandbox caches into one global map, which breaks isolation between independently constructed executors (e.g., in tests or if multiple executor instances are ever used concurrently).

♻️ Proposed fix – move to instance attribute
 class OpenSandboxExecutor(RemoteExecutor):
-    # Interpreter cache: thread_id -> CodeInterpreter instance
-    _sandboxes: dict[str, Any] = {}
+
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        # Per-instance interpreter cache: thread_id -> CodeInterpreter
+        self._sandboxes: dict[str, Any] = {}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/opensandbox/opensandbox_executor.py`
at line 113, The class-level mutable dictionary _sandboxes causes all
OpenSandboxExecutor instances to share state; change _sandboxes from a class
attribute to an instance attribute by removing the top-level "_sandboxes:
dict[str, Any] = {}" and initializing self._sandboxes = {} inside
OpenSandboxExecutor.__init__, then update any references in methods of
OpenSandboxExecutor to use self._sandboxes so each instance maintains its own
sandbox cache.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pyproject.toml`:
- Line 21: Update the APScheduler dependency constraint in pyproject.toml to
prevent automatic upgrade to breaking 4.x by changing the current package spec
"apscheduler>=3.10" to a bounded range that excludes 4.x (e.g.,
"apscheduler>=3.10,<4"); this ensures existing usages of AsyncIOScheduler and
scheduler.add_job() remain compatible until you intentionally migrate to the
APScheduler v4 API.

In
`@src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/common/call_api_helper.py`:
- Around line 194-198: Inside the async function call_api, replace
asyncio.get_event_loop() with asyncio.get_running_loop() to avoid deprecated
implicit loop creation; obtain the running loop into the loop variable used for
loop.run_in_executor(None, _do_request), keeping the _do_request closure that
calls urllib.request.urlopen(req, timeout={timeout_seconds}) intact so the
blocking I/O is still executed in the executor.

In
`@src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/opensandbox/opensandbox_executor.py`:
- Around line 282-285: The download_file flow uses the caller-supplied filename
directly which allows path traversal; change dest_name to a sanitized basename
(e.g. dest_name = Path(filename).name or run a secure filename sanitizer) and if
that yields an empty or unsafe name fall back to Path(sandbox_path).name; then
compute target = (_workspace_dir() / dest_name).resolve() and validate that
target is inside _workspace_dir().resolve() (use Path.is_relative_to or compare
commonpath), rejecting the request or raising an error if the check fails before
calling local_path.write_bytes(data); reference symbols: filename, dest_name,
_workspace_dir(), sandbox_path, local_path, and
interpreter.sandbox.files.read_bytes.
- Around line 214-223: The release_sandbox implementation only pops "_default"
when thread_id is None and silently swallows errors; update release_sandbox to,
if thread_id is None, iterate over and remove all entries from self._sandboxes
(e.g., copy keys or items, pop each), await interpreter.sandbox.kill() and
interpreter.sandbox.close() for each cached interpreter, and on exceptions log
the error instead of using a bare except (use the class/module logger or
self.logger) while continuing to attempt cleanup for remaining sandboxes.
- Around line 353-362: The directory-detection heuristic in the list
comprehension that builds FileEntry from entries_raw is wrong: is_dir=(e.size ==
0 and str(oct(e.mode)).startswith("0o7")) misclassifies directories because
FileInfo.mode contains full st_mode (e.g., 0o40755). Replace that heuristic in
the FileEntry construction with a proper POSIX mode test (use the stat helpers
against e.mode, e.g., stat.S_ISDIR(e.mode) or compare e.mode & stat.S_IFMT to
stat.S_IFDIR) so FileEntry.is_dir correctly reflects directories when creating
ListFilesResult(sandbox_path=..., entries=...).

In `@src/cuga/backend/loops/api.py`:
- Around line 53-57: The expires_in_days field currently accepts negative values
and treats them as None (permanent); add explicit validation on the
expires_in_days field(s) to reject negative integers (raise a validation error)
so only 0 is allowed to mean “never” and positive ints represent days; implement
this via a Pydantic validator (e.g., a `@validator` or `@root_validator`) for the
model(s) that declare expires_in_days (refer to the expires_in_days Field in the
model in src/cuga/backend/loops/api.py and replicate the same validation for the
other model declared around lines 72-76) to throw on values < 0 with a clear
error message.
- Around line 129-134: The handler list_loops currently converts status with
LoopStatus(status) which raises on invalid values; change the FastAPI endpoint
signature to accept status: Optional[LoopStatus] (import LoopStatus) so FastAPI
performs validation and returns a 4xx for bad values, remove the manual
conversion line (status_enum = LoopStatus(status) if status else None) and pass
the validated status directly to svc.registry.list_all (references: list_loops,
LoopStatus, svc.registry.list_all, router.get("/api/list")).

In `@src/cuga/backend/loops/cron_parser.py`:
- Around line 113-121: The code currently wraps both int(spec) and the minimum
check in a single try/except so the "delay must be at least 5 seconds"
ValueError gets caught and replaced by the misleading "unrecognized delay..."
message; fix this by only catching conversion errors: move the n = int(spec)
into its own try/except that raises the "unrecognized delay..." message on
ValueError, then perform the if n < 5 check after that and raise
ValueError("delay must be at least 5 seconds") directly if needed (references:
variable spec and n, and the current error messages).

In `@src/cuga/backend/loops/service.py`:
- Around line 130-140: The discard of asyncio.create_task(...) allows garbage
collection of fire-and-forget tasks; fix by storing the returned Task in a
strong-reference set on the service (e.g., self._background_tasks), add a done
callback to remove the task from that set when complete, and replace both raw
asyncio.create_task(...) calls (the one calling
self.registry.update_status(loop.id, LoopStatus.EXPIRED) and the one triggered
by run_now() that leads to fire_loop()) with this pattern so tasks are retained
until finished (use the unique symbols self.registry.update_status, run_now(),
and fire_loop() to locate the sites and implement the set and cleanup callback).

In `@src/cuga/backend/loops/tools.py`:
- Around line 138-153: The tool currently lists loops by calling
svc.registry.list_all(thread_id=thread) after retrieving agent, thread from
_identity(), which exposes loops from other agents on the same thread; change
the query to scope by both agent and thread (e.g., pass agent or agent.id /
agent_name alongside thread_id or call a registry method like
list_by_agent_and_thread) so the returned loops are filtered to the current
agent; update the call site in the function that uses _identity() and
svc.registry.list_all to include the agent filter and adjust any mapping logic
if the registry method name or parameters differ.

In `@src/cuga/backend/loops/ui.html`:
- Around line 142-157: The auto-refresh in refresh() races with act()/del() and
the periodic setInterval, so implement request-versioning or use an
AbortController to cancel/ignore stale refresh responses: add a module-scoped
inFlightToken (or controller) that refresh() sets before fetch, have act()/del()
either await refresh() or bump the token (or abort the controller) after
performing their mutations, and have refresh() ignore any response whose
token/controller no longer matches; additionally consider pausing the periodic
setInterval while a modal/panel is open by checking the panel state before
starting a new refresh to avoid clobbering newer UI state.
- Around line 180-198: The template builds unsafe HTML in tr.innerHTML and uses
inline onclick strings with l.id allowing XSS; update the rendering to escape
every interpolated field (status, app_name, agent_name, trigger_spec,
trigger_kind, thread_id, last_error, outcome, fire_count/error_count etc.) by
calling escapeHtml before inserting, and validate l.status against an allowlist
(e.g., active|paused|cancelled|expired|orphaned) before using it in the b-${...}
class; stop using inline onclick="..." and instead create buttons via
document.createElement and attach event handlers with addEventListener that call
act(del) with the safely-escaped l.id (or pass the raw id as data-* and never
interpolate into HTML); apply the same escaping and status validation in the
detail view (references: tr.innerHTML, escapeHtml, act, del, tbody.appendChild,
and loop.* fields in the detail panel).

In `@src/cuga/backend/server/demo_manage_setup.py`:
- Around line 363-372: The demo_skills preset currently returns "filesystem":
False which conflicts with the starter prompt referenced by
DEMO_SKILLS_STARTERS[2] ("What is in my workspace .cuga skills folder?"); either
set "filesystem": True in the demo_skills preset return inside
demo_manage_setup.py (the dict block that returns
crm/email/digital_sales/docs/filesystem/oak_health/knowledge) so the filesystem
tool is available, or remove/replace DEMO_SKILLS_STARTERS[2] with a starter that
does not require filesystem access to avoid the tool-not-found error.

In `@src/cuga/backend/server/main.py`:
- Around line 96-110: The path check currently compares a resolved
workspace_path to a syntactic candidate (built with os.path.abspath), allowing
symlink escape; in _resolve_path_under_cuga_workspace replace the candidate
creation with a real resolution (e.g., candidate =
Path(resolved).resolve(strict=False)) so symlinks are dereferenced the same way
as workspace_path before calling candidate.relative_to(workspace_path); use
resolve(strict=False) to avoid raising for non-existent targets.

In `@src/cuga/backend/skills/registry.py`:
- Around line 37-43: The setup lines embed pip_pkgs and npm_pkgs directly into
shell strings passed to run_command, allowing shell injection from
SkillEntry.requirements; validate and/or sanitize each package name before using
them (e.g., allow only safe characters and version specifiers, reject or strip
values containing shell metacharacters), and prefer building commands as
argument lists rather than interpolated shell strings. Specifically, in
registry.py where setup_lines are appended (the blocks that reference pip_pkgs,
npm_pkgs and call run_command), filter/validate each requirement entry from
SkillEntry.requirements, replace any unsafe entries with a safe-fail behavior,
and change the run_command usage to pass arguments safely (or ensure proper
escaping) so no raw requirement string can inject shell commands.

In `@src/cuga/sdk.py`:
- Around line 1834-1836: stream() currently doesn't seed the loops/context vars
so tools calling _identity() fail; mirror the invoke() behavior by ensuring
stream() computes the same thread_id and calls set_session_attribute(thread_id)
and self._set_loops_context(thread_id) before yielding/dispatching tool calls
(the same calls used in invoke()), so
current_agent_name/current_thread_id/current_app_name are set for any scheduling
tool; apply the same change to the other stream() occurrence noted near the
second _set_loops_context usage.
- Around line 2403-2405: The guard in _set_loops_context currently uses the raw
self._enable_loops which can be None; instead use the resolved flag produced by
_ensure_loops_initialized (call or property set by it) so auto-mode is handled
correctly — update _set_loops_context (and the similar gating around graph
compilation at the other site) to call _ensure_loops_initialized() first (if not
already initialized) and then test the resolved boolean (the attribute set by
_ensure_loops_initialized or its return value) rather than self._enable_loops so
loop context binding works in auto (None) mode.

In `@src/cuga/settings.toml`:
- Around line 217-223: Investigate the implemented behavior for the [loops]
master toggle and then update the comment for the enabled field to match it
exactly: confirm whether enabled = false globally prevents injection/starting of
the scheduler or whether a per-agent enable_loops=True can override it, and then
rewrite the sentence to state that behavior unambiguously (referencing enabled,
enable_loops, and the tools schedule_recurring / schedule_wakeup so readers know
which features are affected); ensure the comment no longer contains the
contradictory "even if" clause and clearly states the precedence rule (master
toggle vs per-agent flag).
- Line 39: The config default enables OpenSandbox by setting opensandbox_sandbox
= true which will break installs without Docker/opensandbox; change the default
back to false (match previous behavior like e2b_sandbox) and/or add a runtime
guard around the OpenSandbox executor import/initialization (where OpenSandbox
executor or related class is referenced) so that if the opensandbox_sandbox flag
is true but the opensandbox package or Docker connection is missing you raise a
clear, actionable error (or fall back) instead of letting
ModuleNotFoundError/ConnectionError propagate.

In `@src/frontend_workspaces/frontend/src/ChatLanding.tsx`:
- Around line 687-708: The current reset of workspaceTreeDirPathsPrevRef in the
effect that runs on effectiveChatThreadId causes every directory to be treated
as "new" and auto-expanded on the next workspaceTree update; change the logic so
we don't seed prev ref to an empty set on thread switch but instead detect the
first successful tree load and initialize workspaceTreeDirPathsPrevRef.current =
valid without adding those paths to setWorkspaceExpandedDirs (or use a
hasInitialTree flag to skip adding new valid paths on the first non-empty
payload). Update the effects that reference workspaceTreeDirPathsPrevRef,
workspaceTree, setWorkspaceExpandedDirs and effectiveChatThreadId accordingly so
only directories genuinely added after the initial load are auto-expanded.

---

Outside diff comments:
In `@README.md`:
- Around line 1082-1083: The markdown link in README.md has the link text and
URL swapped for the runtime-tools example; update the bullet to use the correct
[text](url) ordering by making the visible text "Adding Tools" and the href
"./docs/examples/cuga_with_runtime_tools/README.md" (i.e., change
`[./docs/examples/cuga_with_runtime_tools/README.md](Adding Tools)` to `Adding
Tools` as the link text pointing to that path) and ensure the URL remains the
file path (no percent-encoding).

In `@src/frontend_workspaces/agentic_chat/src/CustomChat.tsx`:
- Around line 228-244: The async loader (loadFiles) can finish after threadId
changes and overwrite allFiles with stale data; capture the request's thread id
up-front (e.g., const requestedTid = (threadIdRef.current || threadId ||
"").trim() || undefined) and after awaiting
workspaceService.getWorkspaceTree(...) verify the current thread id still
matches requestedTid (or check threadIdRef.current/threadId) before calling
setAllFiles; alternatively add a per-load abort token/counter checked before
setAllFiles to ignore results from previous loads (refer to loadFiles,
threadIdRef, threadId, getWorkspaceTree, and setAllFiles).

In `@src/frontend_workspaces/agentic_chat/src/FileAutocomplete.tsx`:
- Around line 380-392: The mock-fallback latches via usedMockRef.current in
loadWorkspaceFiles, causing mock data to persist across threadId changes; update
the logic so that when a real fetch succeeds you clear the latch (set
usedMockRef.current = false) and also add a threadId change handler (e.g., in
the effect that triggers loadWorkspaceFiles) to reset usedMockRef.current when
threadId changes so a fresh getWorkspaceTree attempt will not be blocked from
using real data; reference functions/variables: loadWorkspaceFiles, usedMockRef,
useMockData, getWorkspaceTree, threadId, and setAllFiles.

In `@src/frontend_workspaces/agentic_chat/src/workspaceService.ts`:
- Around line 122-130: In fetchWorkspaceData's catch block, avoid returning
cachedData unguarded because it may belong to another thread; instead, only fall
back to this.cachedData when this.cachedForThreadId === (threadId ?? '') so the
cached entry matches the requested threadId (refer to fetchWorkspaceData,
this.cachedData, this.cachedForThreadId, and threadId). If the guard fails,
return the empty tree { tree: [], timestamp: Date.now() } as currently
implemented.
- Around line 67-106: getWorkspaceTree races when threadId changes because a
single shared pendingRequest and cachedForThreadId allow an in-flight fetch for
thread A to be returned and cached for thread B; change the logic to key pending
requests by thread id (e.g., pendingRequests: Map<string,
Promise<WorkspaceData>>) and only return or await a pending request when its key
matches tidKey, ensure when resolving a fetch from fetchWorkspaceData(threadId)
you verify the resolved data is being stored only if tidKey still matches
cachedForThreadId (or store cache per-thread), and update clearCache to clear
the pending request entry for the affected thread (clear any
pendingForThreadId/pendingRequests entries) so we never cache or return a
different thread's data.

---

Nitpick comments:
In `@pyproject.toml`:
- Line 68: The dependency extras "opensandbox" and
"opensandbox-code-interpreter" in pyproject.toml are unpinned and make builds
non-reproducible; update the opensandbox extras list to include explicit version
constraints (at minimum a floor, e.g., "opensandbox>=X.Y.Z" and
"opensandbox-code-interpreter>=A.B.C") or pinned versions once compatibility is
determined, replacing the current unversioned entries so the project uses a
stable, reproducible release.

In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py`:
- Around line 1413-1435: The sandbox tool injection loop in cuga_lite_graph
currently assigns st.coroutine or st.func directly into tools_context_dict
without using make_tool_awaitable, creating inconsistency with other tool
injections; update the loop that iterates over sandbox_tools (created via
CodeExecutor._get_opensandbox_executor().create_sandbox_tools) to wrap the
chosen callable (st.coroutine or st.func) with make_tool_awaitable() before
assigning to tools_context_dict[st.name], mirroring how other tools (find_tools,
todos, execution/skill tools) are handled so both async and sync sandbox tools
are safe to await.

In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/code_executor.py`:
- Around line 310-317: The duplicate conditional branches around
_skills_enabled() are redundant because both arms call the same remote execution
path for mode in ('e2b', 'docker'); simplify by removing the _skills_enabled()
check and directly check if mode in ('e2b', 'docker') to call
cls._execute_remotely_for_code_agent(wrapped_code, state, mode), otherwise
prepare locals via cls._prepare_locals_for_code_agent(state) and call
cls._execute_locally_for_code_agent(wrapped_code, context_locals); keep
references to wrapped_code, state, mode and the three methods intact.

In
`@src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/common/variable_utils.py`:
- Around line 24-30: The current strip_tools_output_var uses a substring check
("find_tools" in code) that can match comments or strings; change the detection
in strip_tools_output_var to look specifically for a call-site pattern (e.g. use
a regex like r'\bfind_tools\s*\(') so it only strips tools_output when
find_tools is actually invoked; update the function to import and use
re.search(r'\bfind_tools\s*\(', code) instead of the plain substring check to
reduce false positives.

In
`@src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/opensandbox/opensandbox_executor.py`:
- Line 113: The class-level mutable dictionary _sandboxes causes all
OpenSandboxExecutor instances to share state; change _sandboxes from a class
attribute to an instance attribute by removing the top-level "_sandboxes:
dict[str, Any] = {}" and initializing self._sandboxes = {} inside
OpenSandboxExecutor.__init__, then update any references in methods of
OpenSandboxExecutor to use self._sandboxes so each instance maintains its own
sandbox cache.

In `@src/cuga/backend/llm/models.py`:
- Around line 566-567: The tuple reasoning_prefixes in the function that returns
model_name.startswith(reasoning_prefixes) contains a redundant entry 'gpt-5.5'
because any name starting with 'gpt-5.5' already starts with 'gpt-5'; remove
'gpt-5.5' from the reasoning_prefixes tuple and keep 'gpt-5' and 'azure/gpt-5.5'
as needed so the startswith check remains correct and non-redundant.

In `@src/cuga/backend/loops/runner.py`:
- Around line 64-85: The post-run registry calls (svc.registry.record_fire,
svc.registry.insert_run, svc.registry.update_status,
svc.registry.update_next_fire) are executed as one sequential happy path and a
failure in any call can leave loop state inconsistent; change the code so each
registry operation is wrapped in its own try/except that logs the exception
(including loop_id and error) and continues, and ensure the scheduling updates
(calling svc._next_fire_ts and then svc.registry.update_next_fire or
update_status for TriggerKind.DELAY) run regardless of failures in
record_fire/insert_run so one-shot DELAY loops are always marked CANCELLED and
recurring loops always have next_fire_at refreshed even if run-record writes
fail.
- Around line 80-85: The runner is calling the private helper svc._next_fire_ts,
coupling to a name-prefixed private API; add a public method on the LoopsService
class (e.g., LoopsService.next_fire_ts(loop_id)) that wraps or implements the
existing logic and then update runner.py to call svc.next_fire_ts(loop_id)
instead of svc._next_fire_ts(loop_id); ensure the new method preserves the
existing behavior used by runner (used in the branch handling
non-TriggerKind.DELAY where nxt is set and passed to registry.update_next_fire)
and update any imports/tests that referenced the private name.

In `@src/cuga/backend/skills/loader.py`:
- Around line 104-110: The current branch treats sets the same as lists/tuples
causing non-deterministic ordering for requirements (variable value ->
candidates -> return tuple(...)); change the conditional to handle set
separately by converting a set into a deterministically ordered sequence (e.g.,
sorted(value)) while keeping lists/tuples as-is, so replace the
isinstance(value, (list, tuple, set)) handling with separate branches: one for
list/tuple that assigns candidates = value and one for set that assigns
candidates = sorted(value), then proceed to the existing tuple(str(item).strip()
...) return to ensure stable, deterministic install order.

In `@src/cuga/backend/skills/tools.py`:
- Around line 15-29: The current load_skill_impl inside create_skill_tools calls
registry.load_skill synchronously which will block if registry.load_skill later
performs I/O; change load_skill_impl to an async function (async def
load_skill_impl(name: str) -> str) and switch the StructuredTool creation to
pass the coroutine variant (use StructuredTool.from_function for
coroutine/coroutine=... or the library's coroutine argument) so StructuredTool
wraps an async callable; update any typing to reflect async return and keep
args_schema=LoadSkillInput intact.

In `@src/cuga/sdk_core/tests/test_context_summarization_sdk.py`:
- Around line 373-378: The assertion uses a permissive cap max(35,
len(large_history) // 3) which allows far too many messages post-summarization;
update the cap calculation in the test (both the occurrence around
message_count_before and the similar check near lines 820-825) to a tighter
bound such as max(KEEP_LAST_N_MESSAGES + 10, len(large_history) // 10) (or
max(keep_last_n + 10, len(history)//10) if different variable names) so the test
will fail for partial summarizers, and add a short inline comment referencing
KEEP_LAST_N_MESSAGES to explain the headroom for tool-call messages;
alternatively, if the intent is only to detect "no compression", add a comment
clarifying that and leave the original formula.

In `@src/frontend_workspaces/agentic_chat/src/FileAutocomplete.tsx`:
- Around line 175-309: Split the big useEffect into two effects: one (deps
[disabled, threadId]) that only runs loadWorkspaceFiles and manages fileInterval
(capture the interval ID and clear it on cleanup), and a second effect (deps
[disabled]) that encapsulates setupCarbonListeners and all DOM wiring; inside
that listener effect capture and clear any setTimeout IDs, store references to
added listeners (e.g., handleCarbonInput, handleDocumentInput) and to elements
you attach per-element listeners to (those marked with
data-autocomplete-listener) so you can removeEventListener for each on cleanup,
and ensure the retry setTimeout created by setupCarbonListeners is cleared if
the effect unmounts; update references to loadWorkspaceFiles,
setupCarbonListeners, handleCarbonInput, handleDocumentInput and fileInterval in
the respective effects.

In `@src/frontend_workspaces/frontend/src/api.ts`:
- Around line 266-285: getWorkspaceTree currently builds its querystring with
encodeURIComponent manually; update getWorkspaceTree to use URLSearchParams
(like getWorkspaceFile and getWorkspaceDownload) to construct the thread_id
parameter for consistency and correctness—modify the getWorkspaceTree function
to create a new URLSearchParams({ thread_id: threadId }) when threadId is
provided and append its toString() to `/api/workspace/tree` accordingly.

In `@src/frontend_workspaces/frontend/src/ChatLanding.tsx`:
- Around line 681-685: The polling interval in ChatLanding.tsx's useEffect
currently calls fetchWorkspaceTree every 2500ms which is below
workspaceService.getWorkspaceTree's 3000ms throttle; update the effect to either
raise the interval to 3000ms (setInterval(fetchWorkspaceTree, 3000)) or modify
fetchWorkspaceTree to call workspaceService.getWorkspaceTree({ forceRefresh:
true }) so requests are not stalled by the service throttle; ensure you only
change the interval or add the forceRefresh flag and keep the existing
clearInterval cleanup.

In `@src/frontend_workspaces/frontend/src/ManagePage.tsx`:
- Around line 1561-1572: The onClick handler for the expand toggle uses a
ternary purely for side effects (isExpanded ? next.delete(skill.name) :
next.add(skill.name)) which can trigger linter errors; in the setExpandedSkills
updater inside the button onClick, replace the ternary with an explicit if/else:
create next as new Set(prev), then if (isExpanded) { next.delete(skill.name); }
else { next.add(skill.name); } and finally return next. This change should be
made in the inline handler passed to setExpandedSkills (the function referencing
isExpanded and skill.name).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

Comment thread pyproject.toml
"markdownify",
"uvicorn",
"fastapi[standard]",
"apscheduler>=3.10",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

# First, find where APScheduler is used in the codebase
rg -i "apscheduler|from.*scheduler|import.*scheduler" --type py -A 3 -B 1 | head -100

Repository: cuga-project/cuga-agent

Length of output: 3235


🌐 Web query:

APScheduler 4.0 release breaking changes migration guide

💡 Result:

APScheduler 4.0 is still in pre-release (latest 4.0.0a6 released April 27, 2025), explicitly marked as not for production use due to potential backwards incompatible changes without migration pathways [1][2][3][4]. The stable version remains 3.11.2 (Dec 22, 2025) [3][4][5]. Official migration guide from 3.x to 4.0 (https://apscheduler.readthedocs.io/en/master/migration.html) [6]: - APScheduler 4.0 is a partial rewrite [6][7]. - Time zone support revamped to use zoneinfo (Python 3.9+), drop pytz [6][1]. - add_job renamed to add_schedule; add_job now for one-off runs [6][7]. - BlockingScheduler and BackgroundScheduler combined into Scheduler with run_until_stopped (replaces BlockingScheduler) and start_in_background (replaces BackgroundScheduler) [6]. - AsyncIOScheduler replaced by generic AsyncScheduler (AnyIO-based, supports asyncio/Trio); use as async context manager [6]. - Dropped other schedulers (Qt, Twisted, etc.); some may return before final [6]. - No multiple data stores per scheduler [6]. - configure method removed; pass kwargs to scheduler constructor [6]. - No automatic import from 3.x job stores yet; planned before final release [6][7]. Progress tracking mentions v3->v4 migration tool in development [8]. Key breaking changes from alpha releases (not exhaustive, as pre-release) [9][1][10][11][12]: - Data store methods refactored (e.g., extend_acquired_schedule_leases, cleanup changes, release_schedules takes ScheduleResult) [9][1]. - Task/Schedule attributes read-only; metadata field added [1][10]. - Removed job_executor/max_running_jobs from add_schedule; use configure_task or @task decorator; task_defaults replaces default_job_executor [1][10]. - AsyncpgEventBroker takes connection string directly [9]. - lock_expiration_delay -> lease_duration [1]. - CronTrigger: timezone arg to from_crontab; Sunday=0 [1]. - Dropped Python <3.8, some job stores (Redis, etc.) [1]. Full version history: https://apscheduler.readthedocs.io/en/master/versionhistory.html [1]. GitHub releases: https://github.com/agronholm/apscheduler/releases [10]. Do not use 4.0 in production [1][2].

Citations:


🌐 Web query:

APScheduler 4.0 release date stable version 2025 2026

💡 Result:

There is no stable release of APScheduler 4.0 as of 2026-05-07. The latest release is 4.0.0a6 (alpha), published on 2025-04-27 [1][2]. Official documentation explicitly warns that the v4.0 series is a pre-release not suitable for production, with potential backwards-incompatible changes and no migration path [3][4]. The current stable version is APScheduler 3.11.2, released on 2025-12-22 [5]. Development on 4.0 continues slowly; the maintainer intends to finish it, including a v3-to-v4 migration tool required for beta, but no beta or stable 4.0 has been released yet. The 4.0 milestone remains open with some issues unresolved [6][7].

Citations:


🏁 Script executed:

# Verify actual add_job usage in codebase
rg "add_job|add_schedule" --type py -B 2 -A 2

Repository: cuga-project/cuga-agent

Length of output: 442


Pin APScheduler to <4 to prevent a future breaking upgrade.

APScheduler 4.x (currently in alpha at 4.0.0a6) is a ground-up rewrite with breaking changes to core APIs used in this codebase. Specifically, AsyncIOScheduler is being removed and replaced with AsyncScheduler (which uses async context managers), and add_job() is being renamed to add_schedule(). The codebase directly uses both APIs: importing AsyncIOScheduler from apscheduler.schedulers.asyncio and calling scheduler.add_job() in the loops service. The current constraint >=3.10 will silently upgrade to 4.x once its first stable release lands, causing runtime failures.

🛠️ Proposed fix
-    "apscheduler>=3.10",
+    "apscheduler>=3.10,<4",
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pyproject.toml` at line 21, Update the APScheduler dependency constraint in
pyproject.toml to prevent automatic upgrade to breaking 4.x by changing the
current package spec "apscheduler>=3.10" to a bounded range that excludes 4.x
(e.g., "apscheduler>=3.10,<4"); this ensures existing usages of AsyncIOScheduler
and scheduler.add_job() remain compatible until you intentionally migrate to the
APScheduler v4 API.

Comment on lines +194 to +198
loop = asyncio.get_event_loop()
def _do_request():
with urllib.request.urlopen(req, timeout={timeout_seconds}) as resp:
return resp.read().decode()
response_data = await loop.run_in_executor(None, _do_request)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

# First, check if the file exists and examine the relevant lines
if [ -f "src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/common/call_api_helper.py" ]; then
  echo "=== File found, examining lines around 194-198 ==="
  sed -n '180,220p' "src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/common/call_api_helper.py" | cat -n
else
  echo "File not found at expected path"
  find . -name "call_api_helper.py" -type f 2>/dev/null | head -5
fi

Repository: cuga-project/cuga-agent

Length of output: 1911


🏁 Script executed:

# Check Python version constraints in the project
if [ -f "pyproject.toml" ]; then
  echo "=== pyproject.toml Python version ==="
  grep -A 5 "python" pyproject.toml | head -15
fi

if [ -f "setup.py" ]; then
  echo "=== setup.py Python version ==="
  grep -i "python_requires" setup.py
fi

# Check for settings.toml mentioned in the scratchpad
if [ -f "settings.toml" ]; then
  cat settings.toml
fi

Repository: cuga-project/cuga-agent

Length of output: 482


🏁 Script executed:

# Search for the call_api function definition to verify async context
rg "def call_api" --context 10 "src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/common/"

Repository: cuga-project/cuga-agent

Length of output: 4851


🌐 Web query:

asyncio.get_event_loop() deprecation Python 3.10 async context

💡 Result:

asyncio.get_event_loop is deprecated in Python 3.10+ when called without a running event loop, emitting DeprecationWarning[1][2][3]. In Python 3.14, it raises RuntimeError if no current event loop is set (default policy change)[2][4][5]. The policy system is deprecated in 3.14 and slated for removal in 3.16[6][5]. In async contexts (coroutines/callbacks), prefer asyncio.get_running_loop, which requires a running loop and has simpler behavior[1][7][2]. For top-level synchronous code, use high-level asyncio.run(coro) instead of manually getting/creating loops[1][8][9][10]. Example: async def main: # async code here pass asyncio.run(main) For multiple calls in the same loop/context (Python 3.12+), use asyncio.Runner[11][10][12]: with asyncio.Runner as runner: runner.run(coro1) runner.run(coro2) If needing a loop reference before running (rare), use asyncio.new_event_loop then loop.run_until_complete, but Runner or run handle shutdown better[13][14][9]. As of 2026-05-07 (post-3.14), avoid get_event_loop in new code; migrate to run/Runner/get_running_loop as appropriate[15][16][9].

Citations:


🌐 Web query:

asyncio.get_running_loop() vs get_event_loop() inside coroutine Python

💡 Result:

Inside a coroutine, use asyncio.get_running_loop instead of asyncio.get_event_loop. The official Python documentation explicitly recommends this: "Because this function has rather complex behavior (especially when custom event loop policies are in use), using the get_running_loop function is preferred to get_event_loop in coroutines and callbacks."[1] Key differences: - asyncio.get_running_loop: Returns the running event loop in the current OS thread. Raises RuntimeError if no running event loop. Can only be called from a coroutine or callback.[1][2] - asyncio.get_event_loop: Gets the current event loop, which when called from a coroutine returns the running loop (like get_running_loop), but outside it may create/set a new one based on policy. Deprecated since Python 3.10 with warnings, and in 3.14 raises RuntimeError without current loop; policy system deprecated, removal in 3.16.[1][2][3] This makes get_running_loop more predictable and safer inside coroutines, as it fails explicitly if no loop is running, preventing subtle bugs from implicit loop creation.[1][4] Example: async def my_coroutine: loop = asyncio.get_running_loop # Preferred[1] # vs loop = asyncio.get_event_loop # Deprecated/complex[1]

Citations:


Use asyncio.get_running_loop() instead of asyncio.get_event_loop() inside the async function.

The generated code calls get_event_loop() within an async def call_api() coroutine. Since Python 3.10, get_event_loop() is deprecated when called without an active event loop and will emit a DeprecationWarning in your project's supported version range. Inside a coroutine, get_running_loop() is the correct idiom — it explicitly returns the running loop and raises RuntimeError if called outside a running context, preventing subtle bugs from policy-based implicit loop creation.

🛠️ Proposed fix
-    loop = asyncio.get_event_loop()
+    loop = asyncio.get_running_loop()
     def _do_request():
         with urllib.request.urlopen(req, timeout={timeout_seconds}) as resp:
             return resp.read().decode()
     response_data = await loop.run_in_executor(None, _do_request)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
loop = asyncio.get_event_loop()
def _do_request():
with urllib.request.urlopen(req, timeout={timeout_seconds}) as resp:
return resp.read().decode()
response_data = await loop.run_in_executor(None, _do_request)
loop = asyncio.get_running_loop()
def _do_request():
with urllib.request.urlopen(req, timeout={timeout_seconds}) as resp:
return resp.read().decode()
response_data = await loop.run_in_executor(None, _do_request)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/common/call_api_helper.py`
around lines 194 - 198, Inside the async function call_api, replace
asyncio.get_event_loop() with asyncio.get_running_loop() to avoid deprecated
implicit loop creation; obtain the running loop into the loop variable used for
loop.run_in_executor(None, _do_request), keeping the _do_request closure that
calls urllib.request.urlopen(req, timeout={timeout_seconds}) intact so the
blocking I/O is still executed in the executor.

Comment on lines +214 to +223
async def release_sandbox(self, thread_id: Optional[str] = None) -> None:
"""Kill and remove the cached interpreter/sandbox for a thread."""
key = thread_id or "_default"
interpreter = self._sandboxes.pop(key, None)
if interpreter:
try:
await interpreter.sandbox.kill()
await interpreter.sandbox.close()
except Exception:
pass
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

release_sandbox(None) only removes the "_default" entry, not all sandboxes

key = thread_id or "_default"
interpreter = self._sandboxes.pop(key, None)

When called without arguments (e.g., during application shutdown), this only pops the single "_default" key; all per-thread sandboxes remain cached and their resources are leaked. Additionally, the except Exception: pass silently swallows kill/close errors with no log output.

🛠️ Proposed fix
 async def release_sandbox(self, thread_id: Optional[str] = None) -> None:
     """Kill and remove the cached interpreter/sandbox for a thread.
-    Pass thread_id=None to release the default sandbox.
+    Pass thread_id=None to release ALL cached sandboxes.
     """
-    key = thread_id or "_default"
-    interpreter = self._sandboxes.pop(key, None)
-    if interpreter:
+    keys = [thread_id or "_default"] if thread_id is not None else list(self._sandboxes.keys())
+    for key in keys:
+        interpreter = self._sandboxes.pop(key, None)
+        if not interpreter:
+            continue
         try:
             await interpreter.sandbox.kill()
             await interpreter.sandbox.close()
         except Exception:
-            pass
+            logger.debug(f"[OpenSandboxExecutor] Error during sandbox release for key={key!r}")
🧰 Tools
🪛 Ruff (0.15.12)

[error] 222-223: try-except-pass detected, consider logging the exception

(S110)


[warning] 222-222: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/opensandbox/opensandbox_executor.py`
around lines 214 - 223, The release_sandbox implementation only pops "_default"
when thread_id is None and silently swallows errors; update release_sandbox to,
if thread_id is None, iterate over and remove all entries from self._sandboxes
(e.g., copy keys or items, pop each), await interpreter.sandbox.kill() and
interpreter.sandbox.close() for each cached interpreter, and on exceptions log
the error instead of using a bare except (use the class/module logger or
self.logger) while continuing to attempt cleanup for remaining sandboxes.

Comment on lines +282 to +285
data: bytes = await interpreter.sandbox.files.read_bytes(sandbox_path)
dest_name = filename or Path(sandbox_path).name
local_path = _workspace_dir() / dest_name
local_path.write_bytes(data)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Path traversal in download_file via unsanitised filename parameter

dest_name = filename or Path(sandbox_path).name
local_path = _workspace_dir() / dest_name
local_path.write_bytes(data)

Path(sandbox_path).name is safe (returns basename only). However, the caller-supplied filename is not sanitised: a value like "../../../home/user/.ssh/authorized_keys" causes _workspace_dir() / filename to resolve outside cuga_workspace/, writing arbitrary files on the host filesystem.

🔒 Proposed fix
-            dest_name = filename or Path(sandbox_path).name
+            # Strip any directory components to prevent path traversal
+            raw_name = filename or Path(sandbox_path).name
+            dest_name = Path(raw_name).name  # basename only
+            if not dest_name or dest_name in (".", ".."):
+                return f"[download_file error] Invalid filename: {raw_name!r}"
             local_path = _workspace_dir() / dest_name
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/opensandbox/opensandbox_executor.py`
around lines 282 - 285, The download_file flow uses the caller-supplied filename
directly which allows path traversal; change dest_name to a sanitized basename
(e.g. dest_name = Path(filename).name or run a secure filename sanitizer) and if
that yields an empty or unsafe name fall back to Path(sandbox_path).name; then
compute target = (_workspace_dir() / dest_name).resolve() and validate that
target is inside _workspace_dir().resolve() (use Path.is_relative_to or compare
commonpath), rejecting the request or raising an error if the check fails before
calling local_path.write_bytes(data); reference symbols: filename, dest_name,
_workspace_dir(), sandbox_path, local_path, and
interpreter.sandbox.files.read_bytes.

Comment on lines +353 to +362
entries = [
FileEntry(
name=Path(e.path).name,
path=e.path,
is_dir=(e.size == 0 and str(oct(e.mode)).startswith("0o7")),
size_bytes=e.size,
)
for e in entries_raw
]
result = ListFilesResult(sandbox_path=sandbox_path, entries=entries)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

find . -type f -name "opensandbox_executor.py" | head -5

Repository: cuga-project/cuga-agent

Length of output: 159


🏁 Script executed:

wc -l ./src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/opensandbox/opensandbox_executor.py

Repository: cuga-project/cuga-agent

Length of output: 163


🏁 Script executed:

cat -n ./src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/opensandbox/opensandbox_executor.py | sed -n '340,375p'

Repository: cuga-project/cuga-agent

Length of output: 1817


🏁 Script executed:

grep -r "opensandbox" ./src --include="*.py" | grep -E "(import|from)" | head -20

Repository: cuga-project/cuga-agent

Length of output: 1442


🏁 Script executed:

find . -type f \( -name "pyproject.toml" -o -name "requirements.txt" -o -name "setup.py" \) -exec grep -l "opensandbox" {} \;

Repository: cuga-project/cuga-agent

Length of output: 84


🏁 Script executed:

cat ./pyproject.toml | grep -A 5 -B 5 "opensandbox"

Repository: cuga-project/cuga-agent

Length of output: 697


🏁 Script executed:

grep -r "SearchEntry\|e.mode" ./src --include="*.py" -B 3 -A 3 | head -50

Repository: cuga-project/cuga-agent

Length of output: 8145


🏁 Script executed:

cat -n ./src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/opensandbox/opensandbox_executor.py | sed -n '1,50p'

Repository: cuga-project/cuga-agent

Length of output: 2138


🏁 Script executed:

grep -r "is_dir\|stat\|mode" ./src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/opensandbox/opensandbox_executor.py -B 2 -A 2

Repository: cuga-project/cuga-agent

Length of output: 2365


🏁 Script executed:

find . -type f -name "*.py" -path "*/test*" -o -name "*test*.py" | xargs grep -l "SearchEntry\|opensandbox" 2>/dev/null | head -5

Repository: cuga-project/cuga-agent

Length of output: 49


🏁 Script executed:

rg "SearchEntry" --type py -B 5 -A 10

Repository: cuga-project/cuga-agent

Length of output: 2694


🌐 Web query:

OpenSandbox SearchEntry e.mode st_mode filesystem API

💡 Result:

OpenSandbox is a secure sandbox runtime platform by Alibaba for AI agents, featuring a filesystem API for file operations within sandboxes. The SearchEntry is a Python SDK model used for file search operations. In the OpenSandbox Python SDK, SearchEntry defines parameters for the /files/search endpoint, which searches for files matching a glob pattern in a directory tree. Example usage: from opensandbox.models.filesystem import SearchEntry files = await sandbox.files.search( SearchEntry( path="/tmp", pattern=".txt")) The search returns an array of FileInfo objects, where each FileInfo includes a 'mode' field representing Unix file permissions in octal format (e.g., 644 for rw-r--r--), equivalent to the st_mode field from Unix stat without the S_IFMT type bits. Example FileInfo: { "path": "/workspace/main.py", "size": 1024, "modified_at": "2025-11-16T14:30:45Z", "owner": "admin", "group": "admin", "mode": 644 }[1][2] The 'mode' (e.mode in object access) directly corresponds to st_mode & ~S_IFMT, providing permission bits (e.g., 644, 755). This is used in file metadata from search, info, and write operations (WriteEntry also takes mode).[3][4] The API endpoint is GET /files/search?path=/workspace&pattern=/.py, returning FileInfo[] with mode.[1][5]

Citations:


🌐 Web query:

OpenSandbox FileInfo directory detection size mode 0o40000

💡 Result:

In OpenSandbox, the FileInfo object returned by the /files/info API endpoint (via GET request to execd on port 44772) contains file metadata including the 'mode' field, which is the Unix file permissions in octal format (e.g., 644, 755) [1][2]. This mode value is obtained from os.Stat or syscall.Stat_t in the Go implementation within components/execd/pkg/web/controller/utils.go's GetFileInfo function [3]. Directories are detected by checking if the full st_mode has the directory bit set, which is 0o40000 (octal 040000, Linux S_IFDIR). This is the standard Unix way: mode & 0o170000 == 0o40000 indicates a directory [4][5]. For directories, the reported 'mode' includes this bit (e.g., 040755 for drwxr-xr-x), unlike examples showing regular files without it [1]. The size for directories is typically 0x4000 (16384 bytes) in some Unix stat implementations due to internal representation, but OpenSandbox reports actual size from stat [3].

Citations:


🏁 Script executed:

python3 << 'EOF'
# Verify the logic of the heuristic with example modes
import stat
import os

# Example directory mode (0o40755 = drwxr-xr-x)
dir_mode = 0o40755
print(f"Directory mode: {oct(dir_mode)}")
print(f"  Current heuristic check (size==0 and str(oct(mode)).startswith('0o7')): {str(oct(dir_mode)).startswith('0o7')}")
print(f"  Correct check stat.S_ISDIR(): {stat.S_ISDIR(dir_mode)}")
print(f"  Correct check (mode & 0o170000) == 0o040000: {(dir_mode & 0o170000) == 0o040000}")
print()

# Example regular file mode (0o100644 = -rw-r--r--)
file_mode = 0o100644
print(f"Regular file mode: {oct(file_mode)}")
print(f"  Current heuristic check (size==0 and str(oct(mode)).startswith('0o7')): {str(oct(file_mode)).startswith('0o7')}")
print(f"  Correct check stat.S_ISDIR(): {stat.S_ISDIR(file_mode)}")
print(f"  Correct check (mode & 0o170000) == 0o040000: {(file_mode & 0o170000) == 0o040000}")
print()

# Example executable file (0o100755 = -rwxr-xr-x)
exec_mode = 0o100755
print(f"Executable file mode: {oct(exec_mode)}")
print(f"  Current heuristic check (size==0 and str(oct(mode)).startswith('0o7')): {str(oct(exec_mode)).startswith('0o7')}")
print(f"  Correct check stat.S_ISDIR(): {stat.S_ISDIR(exec_mode)}")
print(f"  Correct check (mode & 0o170000) == 0o040000: {(exec_mode & 0o170000) == 0o040000}")
EOF

Repository: cuga-project/cuga-agent

Length of output: 663


🏁 Script executed:

grep -n "^import\|^from" ./src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/opensandbox/opensandbox_executor.py | head -20

Repository: cuga-project/cuga-agent

Length of output: 448


Fix directory detection logic—current heuristic misclassifies all directories as files

The current check is_dir=(e.size == 0 and str(oct(e.mode)).startswith("0o7")) fails for OpenSandbox's FileInfo.mode, which includes the full Unix st_mode with directory bit (0o40000). For a typical directory with mode 0o40755:

  • oct(0o40755)'0o40755'.startswith("0o7") is False

This causes all directories to be incorrectly reported as files. Replace with:

Fix
+import stat
 ...
             entries = [
                 FileEntry(
                     name=Path(e.path).name,
                     path=e.path,
-                    is_dir=(e.size == 0 and str(oct(e.mode)).startswith("0o7")),
+                    is_dir=stat.S_ISDIR(e.mode),
                     size_bytes=e.size,
                 )
                 for e in entries_raw
             ]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
entries = [
FileEntry(
name=Path(e.path).name,
path=e.path,
is_dir=(e.size == 0 and str(oct(e.mode)).startswith("0o7")),
size_bytes=e.size,
)
for e in entries_raw
]
result = ListFilesResult(sandbox_path=sandbox_path, entries=entries)
entries = [
FileEntry(
name=Path(e.path).name,
path=e.path,
is_dir=stat.S_ISDIR(e.mode),
size_bytes=e.size,
)
for e in entries_raw
]
result = ListFilesResult(sandbox_path=sandbox_path, entries=entries)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/opensandbox/opensandbox_executor.py`
around lines 353 - 362, The directory-detection heuristic in the list
comprehension that builds FileEntry from entries_raw is wrong: is_dir=(e.size ==
0 and str(oct(e.mode)).startswith("0o7")) misclassifies directories because
FileInfo.mode contains full st_mode (e.g., 0o40755). Replace that heuristic in
the FileEntry construction with a proper POSIX mode test (use the stat helpers
against e.mode, e.g., stat.S_ISDIR(e.mode) or compare e.mode & stat.S_IFMT to
stat.S_IFDIR) so FileEntry.is_dir correctly reflects directories when creating
ListFilesResult(sandbox_path=..., entries=...).

Comment thread src/cuga/sdk.py
Comment on lines 1834 to +1836
# Set session.id for OpenLit observability (if enabled)
set_session_attribute(thread_id)
self._set_loops_context(thread_id)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Seed loops context in stream() too, not only invoke().

The new tools rely on current_agent_name/current_thread_id/current_app_name, but those contextvars are only set in invoke(). A scheduling tool called while using agent.stream(...) will hit _identity() and fail even though the tools were auto-injected successfully.

Suggested fix
 # In both stream() branches, right after set_session_attribute(thread_id)
 set_session_attribute(thread_id)
+self._set_loops_context(thread_id)

Also applies to: 1969-1971

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cuga/sdk.py` around lines 1834 - 1836, stream() currently doesn't seed
the loops/context vars so tools calling _identity() fail; mirror the invoke()
behavior by ensuring stream() computes the same thread_id and calls
set_session_attribute(thread_id) and self._set_loops_context(thread_id) before
yielding/dispatching tool calls (the same calls used in invoke()), so
current_agent_name/current_thread_id/current_app_name are set for any scheduling
tool; apply the same change to the other stream() occurrence noted near the
second _set_loops_context usage.

Comment thread src/cuga/sdk.py
Comment on lines +2403 to +2405
def _set_loops_context(self, thread_id: str) -> None:
if not self._loops_initialized or not self._enable_loops:
return
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

enable_loops=None currently disables supervisor loop tooling.

_ensure_loops_initialized() resolves the settings-backed default, but graph and _set_loops_context() still gate on raw self._enable_loops. In auto mode (None), the supervisor may register with the loops service while the graph is compiled with enable_loops=False and loop context never binds.

Suggested fix
+def _loops_enabled(self) -> bool:
+    if self._enable_loops is not None:
+        return self._enable_loops
+    from cuga.config import settings as _cuga_settings
+    return bool(getattr(getattr(_cuga_settings, "loops", None), "enabled", False))
+
 def _set_loops_context(self, thread_id: str) -> None:
-    if not self._loops_initialized or not self._enable_loops:
+    if not self._loops_initialized or not self._loops_enabled():
         return

 # ...
 supervisor_subgraph = create_cuga_supervisor_graph(
     supervisor_model=self._model,
     agents=self._agents,
-    enable_loops=bool(self._enable_loops),
+    enable_loops=self._loops_enabled(),
 )

Also applies to: 2454-2458

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cuga/sdk.py` around lines 2403 - 2405, The guard in _set_loops_context
currently uses the raw self._enable_loops which can be None; instead use the
resolved flag produced by _ensure_loops_initialized (call or property set by it)
so auto-mode is handled correctly — update _set_loops_context (and the similar
gating around graph compilation at the other site) to call
_ensure_loops_initialized() first (if not already initialized) and then test the
resolved boolean (the attribute set by _ensure_loops_initialized or its return
value) rather than self._enable_loops so loop context binding works in auto
(None) mode.

Comment thread src/cuga/settings.toml
decomposition_strategy = "flexible" # "exact" = one subtask per app, "flexible" = allows multiple subtasks per app
e2b_sandbox = false # use e2b for sandbox:
e2b_sandbox = false # use E2B cloud sandbox for code execution
opensandbox_sandbox = true # use OpenSandbox (Docker) for code execution; requires Docker + opensandbox SDK
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

opensandbox_sandbox = true as the default will break existing installs that lack Docker/OpenSandbox.

Any user who upgrades without Docker installed — or without the opensandbox extra — will get a hard runtime failure the first time code execution is invoked. The previous default was false.

Consider keeping the default false and requiring explicit opt-in (as the e2b_sandbox flag does), or at least guard the executor import so a clear, actionable error is surfaced instead of a raw ModuleNotFoundError/ConnectionError.

🛠️ Suggested change
-opensandbox_sandbox = true    # use OpenSandbox (Docker) for code execution; requires Docker + opensandbox SDK
+opensandbox_sandbox = false   # use OpenSandbox (Docker) for code execution; requires Docker + opensandbox SDK
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
opensandbox_sandbox = true # use OpenSandbox (Docker) for code execution; requires Docker + opensandbox SDK
opensandbox_sandbox = false # use OpenSandbox (Docker) for code execution; requires Docker + opensandbox SDK
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cuga/settings.toml` at line 39, The config default enables OpenSandbox by
setting opensandbox_sandbox = true which will break installs without
Docker/opensandbox; change the default back to false (match previous behavior
like e2b_sandbox) and/or add a runtime guard around the OpenSandbox executor
import/initialization (where OpenSandbox executor or related class is
referenced) so that if the opensandbox_sandbox flag is true but the opensandbox
package or Docker connection is missing you raise a clear, actionable error (or
fall back) instead of letting ModuleNotFoundError/ConnectionError propagate.

Comment thread src/cuga/settings.toml
Comment on lines +217 to +223
[loops]
enabled = false # Master toggle for CUGA loops (agent self-scheduling).
# When false, the schedule_recurring / schedule_wakeup tools
# are not injected and the scheduler is not started, even if
# an agent passes enable_loops=True (the explicit per-agent
# flag overrides this default).
default_expires_days = 7 # Default expiry for scheduled loops if the agent does not specify.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Self-contradictory comment in the [loops] master toggle.

The comment says: "When false, the tools are not injected and the scheduler is not started, even if an agent passes enable_loops=True (the explicit per-agent flag overrides this default)." Those two clauses contradict each other — if enabled = false always suppresses injection, then no per-agent flag overrides it; if a per-agent flag can override it, then the "even if" clause is wrong. Please clarify which semantic is actually implemented.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cuga/settings.toml` around lines 217 - 223, Investigate the implemented
behavior for the [loops] master toggle and then update the comment for the
enabled field to match it exactly: confirm whether enabled = false globally
prevents injection/starting of the scheduler or whether a per-agent
enable_loops=True can override it, and then rewrite the sentence to state that
behavior unambiguously (referencing enabled, enable_loops, and the tools
schedule_recurring / schedule_wakeup so readers know which features are
affected); ensure the comment no longer contains the contradictory "even if"
clause and clearly states the precedence rule (master toggle vs per-agent flag).

Comment on lines +687 to +708
useEffect(() => {
workspaceTreeDirPathsPrevRef.current = new Set();
setWorkspaceExpandedDirs(new Set());
setWorkspaceTree([]);
setWorkspaceTreeLoading(true);
}, [effectiveChatThreadId]);

useEffect(() => {
const valid = collectDirectoryPaths(workspaceTree);
const prevValid = workspaceTreeDirPathsPrevRef.current;
setWorkspaceExpandedDirs((expanded) => {
const next = new Set<string>();
for (const p of expanded) {
if (valid.has(p)) next.add(p);
}
for (const p of valid) {
if (!prevValid.has(p)) next.add(p);
}
return next;
});
workspaceTreeDirPathsPrevRef.current = valid;
}, [workspaceTree]);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Resetting workspaceTreeDirPathsPrevRef to empty causes every directory to auto-expand on the next tree update.

The pruning effect at Lines 694-708 treats anything in valid but not in prevValid as a newly added directory and adds it to the expanded set. After a thread switch (or first mount) the prev ref is the empty set, so the very next tree fetch marks every directory as "new" and expands all of them, including deeply nested ones. That defeats the user's collapsed-by-default expectation and can be heavy on large workspaces.

Consider only auto-expanding directories that appear after the first successful tree load for a given thread, e.g. by tracking a hasInitialTree flag or seeding workspaceTreeDirPathsPrevRef.current = valid on the first non-empty payload without expanding any of them.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/frontend_workspaces/frontend/src/ChatLanding.tsx` around lines 687 - 708,
The current reset of workspaceTreeDirPathsPrevRef in the effect that runs on
effectiveChatThreadId causes every directory to be treated as "new" and
auto-expanded on the next workspaceTree update; change the logic so we don't
seed prev ref to an empty set on thread switch but instead detect the first
successful tree load and initialize workspaceTreeDirPathsPrevRef.current = valid
without adding those paths to setWorkspaceExpandedDirs (or use a hasInitialTree
flag to skip adding new valid paths on the first non-empty payload). Update the
effects that reference workspaceTreeDirPathsPrevRef, workspaceTree,
setWorkspaceExpandedDirs and effectiveChatThreadId accordingly so only
directories genuinely added after the initial load are auto-expanded.

@sami-marreed sami-marreed marked this pull request as draft May 10, 2026 20:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants