[hackathon]: cuga loops (work in progress PR)#198
Conversation
📝 WalkthroughWalkthroughAdds skills and loops systems, OpenSandbox executor/tools with sandbox-backed workspace APIs, extended CugaLite binding/prompts, server endpoints, CLI demo_skills, frontend thread-aware workspace and skills UI, config/docs updates, and tests. ChangesUnified Skills, Loops, Sandbox, and Threaded Workspace
Sequence Diagram(s)sequenceDiagram
participant Frontend
participant Server
participant Skills as SkillsRegistry
participant Sandbox
participant Loops as LoopsService
Frontend->>Server: GET /api/agent/context, /api/skills
Server->>Skills: discover_skills()
Skills-->>Server: skills list
Frontend->>Server: GET /api/workspace/* (thread_id)
Server->>Sandbox: list/read/write via tools
Sandbox-->>Server: tree/bytes
Frontend->>Server: POST /cuga/loops/api/create
Server->>Loops: register_loop()
Loops-->>Server: scheduled fire invokes agent
🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
✨ Finishing Touches🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Actionable comments posted: 20
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (5)
src/frontend_workspaces/agentic_chat/src/workspaceService.ts (2)
122-130:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winReturning cached data on error can serve another thread's tree.
In
fetchWorkspaceData's catch (Line 126),this.cachedDatamay have been written by a previous request for a different thread that interleaved with the thread reset at Lines 67-73. Once the keying fix above is in, also gate this fallback oncachedForThreadId === (threadId ?? '')before returning.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/frontend_workspaces/agentic_chat/src/workspaceService.ts` around lines 122 - 130, In fetchWorkspaceData's catch block, avoid returning cachedData unguarded because it may belong to another thread; instead, only fall back to this.cachedData when this.cachedForThreadId === (threadId ?? '') so the cached entry matches the requested threadId (refer to fetchWorkspaceData, this.cachedData, this.cachedForThreadId, and threadId). If the guard fails, return the empty tree { tree: [], timestamp: Date.now() } as currently implemented.
67-106:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winRace: a
threadIdchange while a request is in flight returns and caches the wrong thread's data.The order of operations bites here:
- Caller A invokes
getWorkspaceTree(false, "A")→ resets cache,cachedForThreadId = "A",pendingRequest = fetchWorkspaceData("A"), awaits.- Caller B invokes
getWorkspaceTree(false, "B")before A resolves → block at Lines 67-73 resetscachedData = null, setscachedForThreadId = "B",lastFetchTime = 0. Then Lines 84-86 short-circuit:return this.pendingRequest— butpendingRequestis still A's fetch.- When A's fetch resolves, the
tryblock in A's call (Lines 97-105) writesthis.cachedData = dataandthis.lastFetchTime = Date.now().cachedForThreadIdis still"B", so A's data is now cached as if it belonged to thread B. Caller B also resolves with A's data.Two consequences: B receives the wrong thread's tree, and subsequent reads for thread B within
MIN_INTERVAL_MSwill keep serving A's tree from cache.🛠️ Suggested fix — key the pending request by threadId, drop short-circuit on mismatch
class WorkspaceService { private static instance: WorkspaceService; private lastFetchTime: number = 0; private cachedData: WorkspaceData | null = null; private cachedForThreadId: string | undefined = undefined; - private pendingRequest: Promise<WorkspaceData> | null = null; + private pendingRequest: Promise<WorkspaceData> | null = null; + private pendingForThreadId: string | undefined = undefined; @@ async getWorkspaceTree(forceRefresh: boolean = false, threadId?: string): Promise<WorkspaceData> { const tidKey = threadId ?? ''; if (this.cachedForThreadId !== tidKey) { this.cachedData = null; this.cachedForThreadId = tidKey; this.lastFetchTime = 0; } @@ - // If there's already a pending request, wait for it instead of making a new one - if (this.pendingRequest) { - return this.pendingRequest; - } + // Only reuse a pending request if it's for the same thread. + if (this.pendingRequest && this.pendingForThreadId === tidKey) { + return this.pendingRequest; + } @@ - this.pendingRequest = this.fetchWorkspaceData(threadId); + this.pendingRequest = this.fetchWorkspaceData(threadId); + this.pendingForThreadId = tidKey; try { const data = await this.pendingRequest; - this.cachedData = data; - this.lastFetchTime = Date.now(); - this.notifyListeners(data); + // Only commit results if the caller's thread is still current. + if (this.cachedForThreadId === tidKey) { + this.cachedData = data; + this.lastFetchTime = Date.now(); + this.notifyListeners(data); + } return data; } finally { - this.pendingRequest = null; + if (this.pendingForThreadId === tidKey) { + this.pendingRequest = null; + this.pendingForThreadId = undefined; + } } }Also clear
pendingForThreadIdinclearCache().🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/frontend_workspaces/agentic_chat/src/workspaceService.ts` around lines 67 - 106, getWorkspaceTree races when threadId changes because a single shared pendingRequest and cachedForThreadId allow an in-flight fetch for thread A to be returned and cached for thread B; change the logic to key pending requests by thread id (e.g., pendingRequests: Map<string, Promise<WorkspaceData>>) and only return or await a pending request when its key matches tidKey, ensure when resolving a fetch from fetchWorkspaceData(threadId) you verify the resolved data is being stored only if tidKey still matches cachedForThreadId (or store cache per-thread), and update clearCache to clear the pending request entry for the affected thread (clear any pendingForThreadId/pendingRequests entries) so we never cache or return a different thread's data.src/frontend_workspaces/agentic_chat/src/FileAutocomplete.tsx (1)
380-392:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winMock-fallback is one-shot; thread switches that fail won't retry real data.
usedMockRef.currentlatchestrueafter the first error and preventsuseMockData()from running again, but the real fetch also won't retry beyond the existing 15s interval. If the very firstgetWorkspaceTreecall for a newthreadIderrors (e.g. backend not yet ready for a freshly created sandbox), the user sees stale mock entries until the interval ticks. Consider clearingusedMockRef.currentwheneverthreadIdchanges (or whenever the next interval succeeds) so the UI escapes the mock state.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/frontend_workspaces/agentic_chat/src/FileAutocomplete.tsx` around lines 380 - 392, The mock-fallback latches via usedMockRef.current in loadWorkspaceFiles, causing mock data to persist across threadId changes; update the logic so that when a real fetch succeeds you clear the latch (set usedMockRef.current = false) and also add a threadId change handler (e.g., in the effect that triggers loadWorkspaceFiles) to reset usedMockRef.current when threadId changes so a fresh getWorkspaceTree attempt will not be blocked from using real data; reference functions/variables: loadWorkspaceFiles, usedMockRef, useMockData, getWorkspaceTree, threadId, and setAllFiles.src/frontend_workspaces/agentic_chat/src/CustomChat.tsx (1)
228-244:⚠️ Potential issue | 🟠 Major | ⚡ Quick winIgnore stale workspace refreshes after a thread switch.
A slower
getWorkspaceTree(..., tid)from the previous thread can still resolve afterthreadIdchanges and overwriteallFileswith the wrong workspace. That makes autocomplete/file hover point at another thread’s files until the next refresh.Suggested fix
useEffect(() => { + let cancelled = false; const loadFiles = async () => { + const tid = (threadIdRef.current || threadId || "").trim() || undefined; try { const { workspaceService } = await import('./workspaceService'); - const tid = (threadIdRef.current || threadId || "").trim() || undefined; const data = await workspaceService.getWorkspaceTree(false, tid); const files = extractFiles(data.tree || []); + const currentTid = (threadIdRef.current || "").trim() || undefined; + if (cancelled || tid !== currentTid) return; setAllFiles(files); } catch (error) { console.error('Error loading files:', error); } }; loadFiles(); const interval = setInterval(loadFiles, 15000); - return () => clearInterval(interval); + return () => { + cancelled = true; + clearInterval(interval); + }; }, [threadId]);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/frontend_workspaces/agentic_chat/src/CustomChat.tsx` around lines 228 - 244, The async loader (loadFiles) can finish after threadId changes and overwrite allFiles with stale data; capture the request's thread id up-front (e.g., const requestedTid = (threadIdRef.current || threadId || "").trim() || undefined) and after awaiting workspaceService.getWorkspaceTree(...) verify the current thread id still matches requestedTid (or check threadIdRef.current/threadId) before calling setAllFiles; alternatively add a per-load abort token/counter checked before setAllFiles to ignore results from previous loads (refer to loadFiles, threadIdRef, threadId, getWorkspaceTree, and setAllFiles).README.md (1)
1082-1083:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winBroken markdown link: text and URL appear swapped.
In
[./docs/examples/cuga_with_runtime_tools/README.md](Adding Tools), the link target isAdding Tools(becomesAdding%20Tools) and the visible text is the path. The next bullet has the same path-as-text shape, just consistent. Restore[text](url)ordering for the runtime-tools example.📝 Proposed fix
-- **Comprehensive example with different tools + MCP**: [./docs/examples/cuga_with_runtime_tools/README.md](Adding Tools) -- **CUGA as MCP**: [./docs/examples/cuga_as_mcp/README.md](docs/examples/cuga_as_mcp) +- **Comprehensive example with different tools + MCP**: [Adding Tools](./docs/examples/cuga_with_runtime_tools/README.md) +- **CUGA as MCP**: [docs/examples/cuga_as_mcp](./docs/examples/cuga_as_mcp/README.md)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@README.md` around lines 1082 - 1083, The markdown link in README.md has the link text and URL swapped for the runtime-tools example; update the bullet to use the correct [text](url) ordering by making the visible text "Adding Tools" and the href "./docs/examples/cuga_with_runtime_tools/README.md" (i.e., change `[./docs/examples/cuga_with_runtime_tools/README.md](Adding Tools)` to `Adding Tools` as the link text pointing to that path) and ensure the URL remains the file path (no percent-encoding).
🧹 Nitpick comments (15)
src/cuga/backend/llm/models.py (1)
566-567: 💤 Low value
'gpt-5.5'prefix is redundant — already covered by'gpt-5'.
str.startswithchecks each prefix in order and short-circuits on the first match. Any model name beginning withgpt-5.5already starts withgpt-5, so the'gpt-5.5'entry can never be the distinguishing match and is dead code. The'azure/gpt-5.5'entry is correct and necessary (for litellm routes).🧹 Proposed cleanup
- reasoning_prefixes = ('o1', 'o3', 'gpt-5', 'gpt-5.5', 'azure/gpt-5.5') + reasoning_prefixes = ('o1', 'o3', 'gpt-5', 'azure/gpt-5.5')🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cuga/backend/llm/models.py` around lines 566 - 567, The tuple reasoning_prefixes in the function that returns model_name.startswith(reasoning_prefixes) contains a redundant entry 'gpt-5.5' because any name starting with 'gpt-5.5' already starts with 'gpt-5'; remove 'gpt-5.5' from the reasoning_prefixes tuple and keep 'gpt-5' and 'azure/gpt-5.5' as needed so the startswith check remains correct and non-redundant.src/cuga/sdk_core/tests/test_context_summarization_sdk.py (1)
373-378: ⚡ Quick win
// 3cap may be too permissive to reliably detect missed summarizationWith
KEEP_LAST_N_MESSAGES = 5and a working summarizer, the checkpoint should contain roughly~6messages (1 summary + 5 recent). The capmax(35, len(history) // 3)permits up to 46 messages for the 138-message large history — roughly 7-8× the expected post-summarization count. This means a severely degraded summarizer (e.g., one that compresses only partially) could still pass the assertion, weakening the regression signal.A tighter bound — say
max_expected_after = max(keep_last_n + 10, len(history) // 10)— would catch partial-compression failures while still giving headroom for tool-call messages added during the invoke. If the intent is only to rule out "no compression at all", the current formula is fine, but a comment noting that would clarify the design choice.💡 Optional tighter cap example
- max_expected_after = max(35, len(large_history) // 3) + # keep_last_n=5, +1 summary, +some tool messages per invoke turn + keep_last_n = int(os.environ.get("DYNACONF_CONTEXT_SUMMARIZATION__KEEP_LAST_N_MESSAGES", "5")) + max_expected_after = max(keep_last_n + 20, len(large_history) // 10) assert message_count_before < max_expected_after, (And similarly at the second changed site (line 820):
- max_expected_after = max(35, len(all_messages) // 3) + keep_last_n = int(os.environ.get("DYNACONF_CONTEXT_SUMMARIZATION__KEEP_LAST_N_MESSAGES", "5")) + max_expected_after = max(keep_last_n + 20, len(all_messages) // 10)Also applies to: 820-825
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cuga/sdk_core/tests/test_context_summarization_sdk.py` around lines 373 - 378, The assertion uses a permissive cap max(35, len(large_history) // 3) which allows far too many messages post-summarization; update the cap calculation in the test (both the occurrence around message_count_before and the similar check near lines 820-825) to a tighter bound such as max(KEEP_LAST_N_MESSAGES + 10, len(large_history) // 10) (or max(keep_last_n + 10, len(history)//10) if different variable names) so the test will fail for partial summarizers, and add a short inline comment referencing KEEP_LAST_N_MESSAGES to explain the headroom for tool-call messages; alternatively, if the intent is only to detect "no compression", add a comment clarifying that and leave the original formula.pyproject.toml (1)
68-68: ⚡ Quick winNo version pins on
opensandbox/opensandbox-code-interpreter.Unpinned extras make the build non-reproducible and may silently pull in a breaking upstream release. At minimum, add a floor version once the compatible version is known.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pyproject.toml` at line 68, The dependency extras "opensandbox" and "opensandbox-code-interpreter" in pyproject.toml are unpinned and make builds non-reproducible; update the opensandbox extras list to include explicit version constraints (at minimum a floor, e.g., "opensandbox>=X.Y.Z" and "opensandbox-code-interpreter>=A.B.C") or pinned versions once compatibility is determined, replacing the current unversioned entries so the project uses a stable, reproducible release.src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/common/variable_utils.py (1)
24-30: 💤 Low value
strip_tools_output_varuses a plain substring match forfind_tools.
"find_tools" in codewill match occurrences inside comments, string literals, or variable names (e.g.# call find_tools later), causingtools_outputto be incorrectly stripped in those cases. For this narrowly-scoped internal helper the risk is low, but a call-site pattern such asfind_tools(would reduce false positives with minimal effort.🛠️ Suggested tightening
- if "find_tools" not in code: + if "find_tools(" not in code: return new_vars🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/common/variable_utils.py` around lines 24 - 30, The current strip_tools_output_var uses a substring check ("find_tools" in code) that can match comments or strings; change the detection in strip_tools_output_var to look specifically for a call-site pattern (e.g. use a regex like r'\bfind_tools\s*\(') so it only strips tools_output when find_tools is actually invoked; update the function to import and use re.search(r'\bfind_tools\s*\(', code) instead of the plain substring check to reduce false positives.src/cuga/backend/skills/tools.py (1)
15-29: 💤 Low valueConsider an async
load_skill_implif the registry ever does I/O.Today
registry.load_skill(name)is in-memory, so a sync function is fine. If the registry later reads/refreshes from disk on miss, switchingload_skill_impltoasync def(and usingStructuredTool.from_function(coroutine=...)) avoids blocking the agent's event loop. Not blocking right now — flagging for when the registry grows.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cuga/backend/skills/tools.py` around lines 15 - 29, The current load_skill_impl inside create_skill_tools calls registry.load_skill synchronously which will block if registry.load_skill later performs I/O; change load_skill_impl to an async function (async def load_skill_impl(name: str) -> str) and switch the StructuredTool creation to pass the coroutine variant (use StructuredTool.from_function for coroutine/coroutine=... or the library's coroutine argument) so StructuredTool wraps an async callable; update any typing to reflect async return and keep args_schema=LoadSkillInput intact.src/frontend_workspaces/frontend/src/ManagePage.tsx (1)
1561-1572: 💤 Low valueUse an
if/elseinstead of a ternary for the expand toggle.
isExpanded ? next.delete(skill.name) : next.add(skill.name);evaluates a ternary purely for side effects; some linters (no-unused-expressions) will flag it, and it's harder to read than the equivalent control-flow form.♻️ Suggested change
- onClick={() => setExpandedSkills((prev) => { - const next = new Set(prev); - isExpanded ? next.delete(skill.name) : next.add(skill.name); - return next; - })} + onClick={() => setExpandedSkills((prev) => { + const next = new Set(prev); + if (isExpanded) { + next.delete(skill.name); + } else { + next.add(skill.name); + } + return next; + })}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/frontend_workspaces/frontend/src/ManagePage.tsx` around lines 1561 - 1572, The onClick handler for the expand toggle uses a ternary purely for side effects (isExpanded ? next.delete(skill.name) : next.add(skill.name)) which can trigger linter errors; in the setExpandedSkills updater inside the button onClick, replace the ternary with an explicit if/else: create next as new Set(prev), then if (isExpanded) { next.delete(skill.name); } else { next.add(skill.name); } and finally return next. This change should be made in the inline handler passed to setExpandedSkills (the function referencing isExpanded and skill.name).src/frontend_workspaces/agentic_chat/src/FileAutocomplete.tsx (1)
175-309: 🏗️ Heavy liftSplit the workspace-fetch effect from the Carbon listener setup; the latter shouldn't re-run on
threadId.Adding
threadIdto the dep array (Line 309) is correct for the polling-fetch part, but it now also tears down and rebuilds all the Carbon-chat / textarea listener wiring every timethreadIdchanges. A few concrete issues that surface as a result:
- The retry path
setTimeout(setupCarbonListeners, 500)(Line 300) and the deferred attachment loopsetTimeout(() => { textareas.forEach(...) }, 1000)(Lines 259-272) don't capture timer IDs and aren't cleared on cleanup.- Listeners attached inside the 1s setTimeout to textareas (with
data-autocomplete-listener) are never removed; the attribute prevents duplicate attachment to the same element, but each effect run creates a fresh closure that can attach to newly discovered nodes, and unmount/threadIdchange doesn't detach any of them.- The
cleanupreturned at Lines 289-298 only runs from the synchronous branch, so athreadIdchange during the retry-loop period skips cleanup entirely.Recommend extracting two effects: one keyed on
[disabled, threadId]that just runsloadWorkspaceFiles+setInterval, and a second keyed on[disabled]for the listener wiring. Track everysetTimeout/setIntervalID and addEventListener call so the cleanup symmetrically removes them.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/frontend_workspaces/agentic_chat/src/FileAutocomplete.tsx` around lines 175 - 309, Split the big useEffect into two effects: one (deps [disabled, threadId]) that only runs loadWorkspaceFiles and manages fileInterval (capture the interval ID and clear it on cleanup), and a second effect (deps [disabled]) that encapsulates setupCarbonListeners and all DOM wiring; inside that listener effect capture and clear any setTimeout IDs, store references to added listeners (e.g., handleCarbonInput, handleDocumentInput) and to elements you attach per-element listeners to (those marked with data-autocomplete-listener) so you can removeEventListener for each on cleanup, and ensure the retry setTimeout created by setupCarbonListeners is cleared if the effect unmounts; update references to loadWorkspaceFiles, setupCarbonListeners, handleCarbonInput, handleDocumentInput and fileInterval in the respective effects.src/frontend_workspaces/frontend/src/ChatLanding.tsx (1)
681-685: 💤 Low value2.5s polling interval is below the workspace service's 3s throttle.
workspaceService.getWorkspaceTreeenforces a 3000ms minimum between fetches; a 2500ms interval here means every other tick is served from cache, but it also means cache-eviction edges (e.g. just after a thread reset) can stall the next request by up to ~500ms while the throttle drains. If the intent is "refresh as fast as the throttle allows", align the interval (e.g. 3000ms) or passforceRefresh: trueand rely on the service's internal wait.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/frontend_workspaces/frontend/src/ChatLanding.tsx` around lines 681 - 685, The polling interval in ChatLanding.tsx's useEffect currently calls fetchWorkspaceTree every 2500ms which is below workspaceService.getWorkspaceTree's 3000ms throttle; update the effect to either raise the interval to 3000ms (setInterval(fetchWorkspaceTree, 3000)) or modify fetchWorkspaceTree to call workspaceService.getWorkspaceTree({ forceRefresh: true }) so requests are not stalled by the service throttle; ensure you only change the interval or add the forceRefresh flag and keep the existing clearInterval cleanup.src/cuga/backend/loops/runner.py (2)
64-85: ⚡ Quick winPost-run bookkeeping is one fragile chain — a registry hiccup leaves loops in inconsistent states.
record_fire,insert_run, thenupdate_status/update_next_firerun as one sequential happy path. If any of them raise (transient DB blip, serialization error on the run summary, etc.), one-shotDELAYloops won't be markedCANCELLEDand recurring loops won't refreshnext_fire_at, so the UI keeps showing the previous values until the next fire. Consider wrapping each step in its own try/except (logging on failure) so a write failure on the run record doesn't strand the loop's scheduling state, and vice versa.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cuga/backend/loops/runner.py` around lines 64 - 85, The post-run registry calls (svc.registry.record_fire, svc.registry.insert_run, svc.registry.update_status, svc.registry.update_next_fire) are executed as one sequential happy path and a failure in any call can leave loop state inconsistent; change the code so each registry operation is wrapped in its own try/except that logs the exception (including loop_id and error) and continues, and ensure the scheduling updates (calling svc._next_fire_ts and then svc.registry.update_next_fire or update_status for TriggerKind.DELAY) run regardless of failures in record_fire/insert_run so one-shot DELAY loops are always marked CANCELLED and recurring loops always have next_fire_at refreshed even if run-record writes fail.
80-85: 💤 Low valueReaching into
svc._next_fire_tsfrom the runner couples to a private API.
_next_fire_tsis a name-prefix-private method; calling it fromrunner.pyis the same scope but readers of either file have to remember the contract. Promote it to a public helper (e.g.LoopsService.next_fire_ts(loop_id)) so the dependency is documented and won't quietly break if the underscore version is renamed during a refactor.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cuga/backend/loops/runner.py` around lines 80 - 85, The runner is calling the private helper svc._next_fire_ts, coupling to a name-prefixed private API; add a public method on the LoopsService class (e.g., LoopsService.next_fire_ts(loop_id)) that wraps or implements the existing logic and then update runner.py to call svc.next_fire_ts(loop_id) instead of svc._next_fire_ts(loop_id); ensure the new method preserves the existing behavior used by runner (used in the branch handling non-TriggerKind.DELAY where nxt is set and passed to registry.update_next_fire) and update any imports/tests that referenced the private name.src/frontend_workspaces/frontend/src/api.ts (1)
266-285: 💤 Low valueLGTM — new skills and thread-aware workspace endpoints look correct.
One minor style note:
getWorkspaceTree(line 271) hand-rollsencodeURIComponentwhilegetWorkspaceFile/getWorkspaceDownloaduseURLSearchParams. Consider aligninggetWorkspaceTreeto useURLSearchParamsfor consistency.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/frontend_workspaces/frontend/src/api.ts` around lines 266 - 285, getWorkspaceTree currently builds its querystring with encodeURIComponent manually; update getWorkspaceTree to use URLSearchParams (like getWorkspaceFile and getWorkspaceDownload) to construct the thread_id parameter for consistency and correctness—modify the getWorkspaceTree function to create a new URLSearchParams({ thread_id: threadId }) when threadId is provided and append its toString() to `/api/workspace/tree` accordingly.src/cuga/backend/skills/loader.py (1)
104-110: 💤 Low value
setin requirements value gives non-deterministic install ordering.Line 104 includes
setas a supported sequence type forrequirements. Sets are unordered, so the resultingtuple[str, ...]can vary across runs. If install order matters (e.g., one package depends on another), this can cause intermittent failures.🛡️ Proposed fix
- elif isinstance(value, (list, tuple, set)): + elif isinstance(value, (list, tuple)): candidates = value + elif isinstance(value, set): + candidates = sorted(value) # deterministic ordering🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cuga/backend/skills/loader.py` around lines 104 - 110, The current branch treats sets the same as lists/tuples causing non-deterministic ordering for requirements (variable value -> candidates -> return tuple(...)); change the conditional to handle set separately by converting a set into a deterministically ordered sequence (e.g., sorted(value)) while keeping lists/tuples as-is, so replace the isinstance(value, (list, tuple, set)) handling with separate branches: one for list/tuple that assigns candidates = value and one for set that assigns candidates = sorted(value), then proceed to the existing tuple(str(item).strip() ...) return to ensure stable, deterministic install order.src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/code_executor.py (1)
310-317: ⚡ Quick winCollapse duplicate
if/elsebranches — both arms are identical.The
_skills_enabled()check guards two byte-identical bodies; the conditional adds noise without changing behavior. Sincemode == 'opensandbox'is already remapped to'local'on line 303-304 above, this block can be flattened.♻️ Proposed simplification
- if _skills_enabled(): - if mode in ('e2b', 'docker'): - return await cls._execute_remotely_for_code_agent(wrapped_code, state, mode) - else: - if mode in ('e2b', 'docker'): - return await cls._execute_remotely_for_code_agent(wrapped_code, state, mode) - context_locals = cls._prepare_locals_for_code_agent(state) - return await cls._execute_locally_for_code_agent(wrapped_code, context_locals) + if mode in ('e2b', 'docker'): + return await cls._execute_remotely_for_code_agent(wrapped_code, state, mode) + context_locals = cls._prepare_locals_for_code_agent(state) + return await cls._execute_locally_for_code_agent(wrapped_code, context_locals)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/code_executor.py` around lines 310 - 317, The duplicate conditional branches around _skills_enabled() are redundant because both arms call the same remote execution path for mode in ('e2b', 'docker'); simplify by removing the _skills_enabled() check and directly check if mode in ('e2b', 'docker') to call cls._execute_remotely_for_code_agent(wrapped_code, state, mode), otherwise prepare locals via cls._prepare_locals_for_code_agent(state) and call cls._execute_locally_for_code_agent(wrapped_code, context_locals); keep references to wrapped_code, state, mode and the three methods intact.src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py (1)
1413-1435: ⚡ Quick winSandbox tools lack
make_tool_awaitablewrapping used for all other tools — add for parity and safety.All other tools injected into
tools_context_dict(find_tools, todos, execution tools, skill tools) are wrapped withmake_tool_awaitable()before assignment. The sandbox tools loop extracts the callable and assigns it directly without wrapping. While all current sandbox tools fromOpenSandboxExecutor.create_sandbox_tools()are async (all created withcoroutine=parameter), this inconsistency creates a maintenance risk: if a future sandbox tool is sync-only, it will fail at runtime when the agent awaits it. Wrap sandbox tools for consistency.♻️ Suggested fix
for st in sandbox_tools: - # Prefer coroutine for async execution context - fn = st.coroutine or st.func + fn = getattr(st, "coroutine", None) or getattr(st, "func", None) if fn: - tools_context_dict[st.name] = fn + tools_context_dict[st.name] = make_tool_awaitable(fn) + else: + logger.warning(f"Sandbox tool '{st.name}' has no callable, skipping")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py` around lines 1413 - 1435, The sandbox tool injection loop in cuga_lite_graph currently assigns st.coroutine or st.func directly into tools_context_dict without using make_tool_awaitable, creating inconsistency with other tool injections; update the loop that iterates over sandbox_tools (created via CodeExecutor._get_opensandbox_executor().create_sandbox_tools) to wrap the chosen callable (st.coroutine or st.func) with make_tool_awaitable() before assigning to tools_context_dict[st.name], mirroring how other tools (find_tools, todos, execution/skill tools) are handled so both async and sync sandbox tools are safe to await.src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/opensandbox/opensandbox_executor.py (1)
113-113: ⚡ Quick winMutable class-level
_sandboxesis shared across all instances
_sandboxes: dict[str, Any] = {}is a class attribute; everyOpenSandboxExecutorinstance reads and writes the same dict. This silently collapses all per-thread sandbox caches into one global map, which breaks isolation between independently constructed executors (e.g., in tests or if multiple executor instances are ever used concurrently).♻️ Proposed fix – move to instance attribute
class OpenSandboxExecutor(RemoteExecutor): - # Interpreter cache: thread_id -> CodeInterpreter instance - _sandboxes: dict[str, Any] = {} + + def __init__(self, *args, **kwargs): + super().__init__(*args, **kwargs) + # Per-instance interpreter cache: thread_id -> CodeInterpreter + self._sandboxes: dict[str, Any] = {}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/opensandbox/opensandbox_executor.py` at line 113, The class-level mutable dictionary _sandboxes causes all OpenSandboxExecutor instances to share state; change _sandboxes from a class attribute to an instance attribute by removing the top-level "_sandboxes: dict[str, Any] = {}" and initializing self._sandboxes = {} inside OpenSandboxExecutor.__init__, then update any references in methods of OpenSandboxExecutor to use self._sandboxes so each instance maintains its own sandbox cache.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pyproject.toml`:
- Line 21: Update the APScheduler dependency constraint in pyproject.toml to
prevent automatic upgrade to breaking 4.x by changing the current package spec
"apscheduler>=3.10" to a bounded range that excludes 4.x (e.g.,
"apscheduler>=3.10,<4"); this ensures existing usages of AsyncIOScheduler and
scheduler.add_job() remain compatible until you intentionally migrate to the
APScheduler v4 API.
In
`@src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/common/call_api_helper.py`:
- Around line 194-198: Inside the async function call_api, replace
asyncio.get_event_loop() with asyncio.get_running_loop() to avoid deprecated
implicit loop creation; obtain the running loop into the loop variable used for
loop.run_in_executor(None, _do_request), keeping the _do_request closure that
calls urllib.request.urlopen(req, timeout={timeout_seconds}) intact so the
blocking I/O is still executed in the executor.
In
`@src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/opensandbox/opensandbox_executor.py`:
- Around line 282-285: The download_file flow uses the caller-supplied filename
directly which allows path traversal; change dest_name to a sanitized basename
(e.g. dest_name = Path(filename).name or run a secure filename sanitizer) and if
that yields an empty or unsafe name fall back to Path(sandbox_path).name; then
compute target = (_workspace_dir() / dest_name).resolve() and validate that
target is inside _workspace_dir().resolve() (use Path.is_relative_to or compare
commonpath), rejecting the request or raising an error if the check fails before
calling local_path.write_bytes(data); reference symbols: filename, dest_name,
_workspace_dir(), sandbox_path, local_path, and
interpreter.sandbox.files.read_bytes.
- Around line 214-223: The release_sandbox implementation only pops "_default"
when thread_id is None and silently swallows errors; update release_sandbox to,
if thread_id is None, iterate over and remove all entries from self._sandboxes
(e.g., copy keys or items, pop each), await interpreter.sandbox.kill() and
interpreter.sandbox.close() for each cached interpreter, and on exceptions log
the error instead of using a bare except (use the class/module logger or
self.logger) while continuing to attempt cleanup for remaining sandboxes.
- Around line 353-362: The directory-detection heuristic in the list
comprehension that builds FileEntry from entries_raw is wrong: is_dir=(e.size ==
0 and str(oct(e.mode)).startswith("0o7")) misclassifies directories because
FileInfo.mode contains full st_mode (e.g., 0o40755). Replace that heuristic in
the FileEntry construction with a proper POSIX mode test (use the stat helpers
against e.mode, e.g., stat.S_ISDIR(e.mode) or compare e.mode & stat.S_IFMT to
stat.S_IFDIR) so FileEntry.is_dir correctly reflects directories when creating
ListFilesResult(sandbox_path=..., entries=...).
In `@src/cuga/backend/loops/api.py`:
- Around line 53-57: The expires_in_days field currently accepts negative values
and treats them as None (permanent); add explicit validation on the
expires_in_days field(s) to reject negative integers (raise a validation error)
so only 0 is allowed to mean “never” and positive ints represent days; implement
this via a Pydantic validator (e.g., a `@validator` or `@root_validator`) for the
model(s) that declare expires_in_days (refer to the expires_in_days Field in the
model in src/cuga/backend/loops/api.py and replicate the same validation for the
other model declared around lines 72-76) to throw on values < 0 with a clear
error message.
- Around line 129-134: The handler list_loops currently converts status with
LoopStatus(status) which raises on invalid values; change the FastAPI endpoint
signature to accept status: Optional[LoopStatus] (import LoopStatus) so FastAPI
performs validation and returns a 4xx for bad values, remove the manual
conversion line (status_enum = LoopStatus(status) if status else None) and pass
the validated status directly to svc.registry.list_all (references: list_loops,
LoopStatus, svc.registry.list_all, router.get("/api/list")).
In `@src/cuga/backend/loops/cron_parser.py`:
- Around line 113-121: The code currently wraps both int(spec) and the minimum
check in a single try/except so the "delay must be at least 5 seconds"
ValueError gets caught and replaced by the misleading "unrecognized delay..."
message; fix this by only catching conversion errors: move the n = int(spec)
into its own try/except that raises the "unrecognized delay..." message on
ValueError, then perform the if n < 5 check after that and raise
ValueError("delay must be at least 5 seconds") directly if needed (references:
variable spec and n, and the current error messages).
In `@src/cuga/backend/loops/service.py`:
- Around line 130-140: The discard of asyncio.create_task(...) allows garbage
collection of fire-and-forget tasks; fix by storing the returned Task in a
strong-reference set on the service (e.g., self._background_tasks), add a done
callback to remove the task from that set when complete, and replace both raw
asyncio.create_task(...) calls (the one calling
self.registry.update_status(loop.id, LoopStatus.EXPIRED) and the one triggered
by run_now() that leads to fire_loop()) with this pattern so tasks are retained
until finished (use the unique symbols self.registry.update_status, run_now(),
and fire_loop() to locate the sites and implement the set and cleanup callback).
In `@src/cuga/backend/loops/tools.py`:
- Around line 138-153: The tool currently lists loops by calling
svc.registry.list_all(thread_id=thread) after retrieving agent, thread from
_identity(), which exposes loops from other agents on the same thread; change
the query to scope by both agent and thread (e.g., pass agent or agent.id /
agent_name alongside thread_id or call a registry method like
list_by_agent_and_thread) so the returned loops are filtered to the current
agent; update the call site in the function that uses _identity() and
svc.registry.list_all to include the agent filter and adjust any mapping logic
if the registry method name or parameters differ.
In `@src/cuga/backend/loops/ui.html`:
- Around line 142-157: The auto-refresh in refresh() races with act()/del() and
the periodic setInterval, so implement request-versioning or use an
AbortController to cancel/ignore stale refresh responses: add a module-scoped
inFlightToken (or controller) that refresh() sets before fetch, have act()/del()
either await refresh() or bump the token (or abort the controller) after
performing their mutations, and have refresh() ignore any response whose
token/controller no longer matches; additionally consider pausing the periodic
setInterval while a modal/panel is open by checking the panel state before
starting a new refresh to avoid clobbering newer UI state.
- Around line 180-198: The template builds unsafe HTML in tr.innerHTML and uses
inline onclick strings with l.id allowing XSS; update the rendering to escape
every interpolated field (status, app_name, agent_name, trigger_spec,
trigger_kind, thread_id, last_error, outcome, fire_count/error_count etc.) by
calling escapeHtml before inserting, and validate l.status against an allowlist
(e.g., active|paused|cancelled|expired|orphaned) before using it in the b-${...}
class; stop using inline onclick="..." and instead create buttons via
document.createElement and attach event handlers with addEventListener that call
act(del) with the safely-escaped l.id (or pass the raw id as data-* and never
interpolate into HTML); apply the same escaping and status validation in the
detail view (references: tr.innerHTML, escapeHtml, act, del, tbody.appendChild,
and loop.* fields in the detail panel).
In `@src/cuga/backend/server/demo_manage_setup.py`:
- Around line 363-372: The demo_skills preset currently returns "filesystem":
False which conflicts with the starter prompt referenced by
DEMO_SKILLS_STARTERS[2] ("What is in my workspace .cuga skills folder?"); either
set "filesystem": True in the demo_skills preset return inside
demo_manage_setup.py (the dict block that returns
crm/email/digital_sales/docs/filesystem/oak_health/knowledge) so the filesystem
tool is available, or remove/replace DEMO_SKILLS_STARTERS[2] with a starter that
does not require filesystem access to avoid the tool-not-found error.
In `@src/cuga/backend/server/main.py`:
- Around line 96-110: The path check currently compares a resolved
workspace_path to a syntactic candidate (built with os.path.abspath), allowing
symlink escape; in _resolve_path_under_cuga_workspace replace the candidate
creation with a real resolution (e.g., candidate =
Path(resolved).resolve(strict=False)) so symlinks are dereferenced the same way
as workspace_path before calling candidate.relative_to(workspace_path); use
resolve(strict=False) to avoid raising for non-existent targets.
In `@src/cuga/backend/skills/registry.py`:
- Around line 37-43: The setup lines embed pip_pkgs and npm_pkgs directly into
shell strings passed to run_command, allowing shell injection from
SkillEntry.requirements; validate and/or sanitize each package name before using
them (e.g., allow only safe characters and version specifiers, reject or strip
values containing shell metacharacters), and prefer building commands as
argument lists rather than interpolated shell strings. Specifically, in
registry.py where setup_lines are appended (the blocks that reference pip_pkgs,
npm_pkgs and call run_command), filter/validate each requirement entry from
SkillEntry.requirements, replace any unsafe entries with a safe-fail behavior,
and change the run_command usage to pass arguments safely (or ensure proper
escaping) so no raw requirement string can inject shell commands.
In `@src/cuga/sdk.py`:
- Around line 1834-1836: stream() currently doesn't seed the loops/context vars
so tools calling _identity() fail; mirror the invoke() behavior by ensuring
stream() computes the same thread_id and calls set_session_attribute(thread_id)
and self._set_loops_context(thread_id) before yielding/dispatching tool calls
(the same calls used in invoke()), so
current_agent_name/current_thread_id/current_app_name are set for any scheduling
tool; apply the same change to the other stream() occurrence noted near the
second _set_loops_context usage.
- Around line 2403-2405: The guard in _set_loops_context currently uses the raw
self._enable_loops which can be None; instead use the resolved flag produced by
_ensure_loops_initialized (call or property set by it) so auto-mode is handled
correctly — update _set_loops_context (and the similar gating around graph
compilation at the other site) to call _ensure_loops_initialized() first (if not
already initialized) and then test the resolved boolean (the attribute set by
_ensure_loops_initialized or its return value) rather than self._enable_loops so
loop context binding works in auto (None) mode.
In `@src/cuga/settings.toml`:
- Around line 217-223: Investigate the implemented behavior for the [loops]
master toggle and then update the comment for the enabled field to match it
exactly: confirm whether enabled = false globally prevents injection/starting of
the scheduler or whether a per-agent enable_loops=True can override it, and then
rewrite the sentence to state that behavior unambiguously (referencing enabled,
enable_loops, and the tools schedule_recurring / schedule_wakeup so readers know
which features are affected); ensure the comment no longer contains the
contradictory "even if" clause and clearly states the precedence rule (master
toggle vs per-agent flag).
- Line 39: The config default enables OpenSandbox by setting opensandbox_sandbox
= true which will break installs without Docker/opensandbox; change the default
back to false (match previous behavior like e2b_sandbox) and/or add a runtime
guard around the OpenSandbox executor import/initialization (where OpenSandbox
executor or related class is referenced) so that if the opensandbox_sandbox flag
is true but the opensandbox package or Docker connection is missing you raise a
clear, actionable error (or fall back) instead of letting
ModuleNotFoundError/ConnectionError propagate.
In `@src/frontend_workspaces/frontend/src/ChatLanding.tsx`:
- Around line 687-708: The current reset of workspaceTreeDirPathsPrevRef in the
effect that runs on effectiveChatThreadId causes every directory to be treated
as "new" and auto-expanded on the next workspaceTree update; change the logic so
we don't seed prev ref to an empty set on thread switch but instead detect the
first successful tree load and initialize workspaceTreeDirPathsPrevRef.current =
valid without adding those paths to setWorkspaceExpandedDirs (or use a
hasInitialTree flag to skip adding new valid paths on the first non-empty
payload). Update the effects that reference workspaceTreeDirPathsPrevRef,
workspaceTree, setWorkspaceExpandedDirs and effectiveChatThreadId accordingly so
only directories genuinely added after the initial load are auto-expanded.
---
Outside diff comments:
In `@README.md`:
- Around line 1082-1083: The markdown link in README.md has the link text and
URL swapped for the runtime-tools example; update the bullet to use the correct
[text](url) ordering by making the visible text "Adding Tools" and the href
"./docs/examples/cuga_with_runtime_tools/README.md" (i.e., change
`[./docs/examples/cuga_with_runtime_tools/README.md](Adding Tools)` to `Adding
Tools` as the link text pointing to that path) and ensure the URL remains the
file path (no percent-encoding).
In `@src/frontend_workspaces/agentic_chat/src/CustomChat.tsx`:
- Around line 228-244: The async loader (loadFiles) can finish after threadId
changes and overwrite allFiles with stale data; capture the request's thread id
up-front (e.g., const requestedTid = (threadIdRef.current || threadId ||
"").trim() || undefined) and after awaiting
workspaceService.getWorkspaceTree(...) verify the current thread id still
matches requestedTid (or check threadIdRef.current/threadId) before calling
setAllFiles; alternatively add a per-load abort token/counter checked before
setAllFiles to ignore results from previous loads (refer to loadFiles,
threadIdRef, threadId, getWorkspaceTree, and setAllFiles).
In `@src/frontend_workspaces/agentic_chat/src/FileAutocomplete.tsx`:
- Around line 380-392: The mock-fallback latches via usedMockRef.current in
loadWorkspaceFiles, causing mock data to persist across threadId changes; update
the logic so that when a real fetch succeeds you clear the latch (set
usedMockRef.current = false) and also add a threadId change handler (e.g., in
the effect that triggers loadWorkspaceFiles) to reset usedMockRef.current when
threadId changes so a fresh getWorkspaceTree attempt will not be blocked from
using real data; reference functions/variables: loadWorkspaceFiles, usedMockRef,
useMockData, getWorkspaceTree, threadId, and setAllFiles.
In `@src/frontend_workspaces/agentic_chat/src/workspaceService.ts`:
- Around line 122-130: In fetchWorkspaceData's catch block, avoid returning
cachedData unguarded because it may belong to another thread; instead, only fall
back to this.cachedData when this.cachedForThreadId === (threadId ?? '') so the
cached entry matches the requested threadId (refer to fetchWorkspaceData,
this.cachedData, this.cachedForThreadId, and threadId). If the guard fails,
return the empty tree { tree: [], timestamp: Date.now() } as currently
implemented.
- Around line 67-106: getWorkspaceTree races when threadId changes because a
single shared pendingRequest and cachedForThreadId allow an in-flight fetch for
thread A to be returned and cached for thread B; change the logic to key pending
requests by thread id (e.g., pendingRequests: Map<string,
Promise<WorkspaceData>>) and only return or await a pending request when its key
matches tidKey, ensure when resolving a fetch from fetchWorkspaceData(threadId)
you verify the resolved data is being stored only if tidKey still matches
cachedForThreadId (or store cache per-thread), and update clearCache to clear
the pending request entry for the affected thread (clear any
pendingForThreadId/pendingRequests entries) so we never cache or return a
different thread's data.
---
Nitpick comments:
In `@pyproject.toml`:
- Line 68: The dependency extras "opensandbox" and
"opensandbox-code-interpreter" in pyproject.toml are unpinned and make builds
non-reproducible; update the opensandbox extras list to include explicit version
constraints (at minimum a floor, e.g., "opensandbox>=X.Y.Z" and
"opensandbox-code-interpreter>=A.B.C") or pinned versions once compatibility is
determined, replacing the current unversioned entries so the project uses a
stable, reproducible release.
In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py`:
- Around line 1413-1435: The sandbox tool injection loop in cuga_lite_graph
currently assigns st.coroutine or st.func directly into tools_context_dict
without using make_tool_awaitable, creating inconsistency with other tool
injections; update the loop that iterates over sandbox_tools (created via
CodeExecutor._get_opensandbox_executor().create_sandbox_tools) to wrap the
chosen callable (st.coroutine or st.func) with make_tool_awaitable() before
assigning to tools_context_dict[st.name], mirroring how other tools (find_tools,
todos, execution/skill tools) are handled so both async and sync sandbox tools
are safe to await.
In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/code_executor.py`:
- Around line 310-317: The duplicate conditional branches around
_skills_enabled() are redundant because both arms call the same remote execution
path for mode in ('e2b', 'docker'); simplify by removing the _skills_enabled()
check and directly check if mode in ('e2b', 'docker') to call
cls._execute_remotely_for_code_agent(wrapped_code, state, mode), otherwise
prepare locals via cls._prepare_locals_for_code_agent(state) and call
cls._execute_locally_for_code_agent(wrapped_code, context_locals); keep
references to wrapped_code, state, mode and the three methods intact.
In
`@src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/common/variable_utils.py`:
- Around line 24-30: The current strip_tools_output_var uses a substring check
("find_tools" in code) that can match comments or strings; change the detection
in strip_tools_output_var to look specifically for a call-site pattern (e.g. use
a regex like r'\bfind_tools\s*\(') so it only strips tools_output when
find_tools is actually invoked; update the function to import and use
re.search(r'\bfind_tools\s*\(', code) instead of the plain substring check to
reduce false positives.
In
`@src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/opensandbox/opensandbox_executor.py`:
- Line 113: The class-level mutable dictionary _sandboxes causes all
OpenSandboxExecutor instances to share state; change _sandboxes from a class
attribute to an instance attribute by removing the top-level "_sandboxes:
dict[str, Any] = {}" and initializing self._sandboxes = {} inside
OpenSandboxExecutor.__init__, then update any references in methods of
OpenSandboxExecutor to use self._sandboxes so each instance maintains its own
sandbox cache.
In `@src/cuga/backend/llm/models.py`:
- Around line 566-567: The tuple reasoning_prefixes in the function that returns
model_name.startswith(reasoning_prefixes) contains a redundant entry 'gpt-5.5'
because any name starting with 'gpt-5.5' already starts with 'gpt-5'; remove
'gpt-5.5' from the reasoning_prefixes tuple and keep 'gpt-5' and 'azure/gpt-5.5'
as needed so the startswith check remains correct and non-redundant.
In `@src/cuga/backend/loops/runner.py`:
- Around line 64-85: The post-run registry calls (svc.registry.record_fire,
svc.registry.insert_run, svc.registry.update_status,
svc.registry.update_next_fire) are executed as one sequential happy path and a
failure in any call can leave loop state inconsistent; change the code so each
registry operation is wrapped in its own try/except that logs the exception
(including loop_id and error) and continues, and ensure the scheduling updates
(calling svc._next_fire_ts and then svc.registry.update_next_fire or
update_status for TriggerKind.DELAY) run regardless of failures in
record_fire/insert_run so one-shot DELAY loops are always marked CANCELLED and
recurring loops always have next_fire_at refreshed even if run-record writes
fail.
- Around line 80-85: The runner is calling the private helper svc._next_fire_ts,
coupling to a name-prefixed private API; add a public method on the LoopsService
class (e.g., LoopsService.next_fire_ts(loop_id)) that wraps or implements the
existing logic and then update runner.py to call svc.next_fire_ts(loop_id)
instead of svc._next_fire_ts(loop_id); ensure the new method preserves the
existing behavior used by runner (used in the branch handling
non-TriggerKind.DELAY where nxt is set and passed to registry.update_next_fire)
and update any imports/tests that referenced the private name.
In `@src/cuga/backend/skills/loader.py`:
- Around line 104-110: The current branch treats sets the same as lists/tuples
causing non-deterministic ordering for requirements (variable value ->
candidates -> return tuple(...)); change the conditional to handle set
separately by converting a set into a deterministically ordered sequence (e.g.,
sorted(value)) while keeping lists/tuples as-is, so replace the
isinstance(value, (list, tuple, set)) handling with separate branches: one for
list/tuple that assigns candidates = value and one for set that assigns
candidates = sorted(value), then proceed to the existing tuple(str(item).strip()
...) return to ensure stable, deterministic install order.
In `@src/cuga/backend/skills/tools.py`:
- Around line 15-29: The current load_skill_impl inside create_skill_tools calls
registry.load_skill synchronously which will block if registry.load_skill later
performs I/O; change load_skill_impl to an async function (async def
load_skill_impl(name: str) -> str) and switch the StructuredTool creation to
pass the coroutine variant (use StructuredTool.from_function for
coroutine/coroutine=... or the library's coroutine argument) so StructuredTool
wraps an async callable; update any typing to reflect async return and keep
args_schema=LoadSkillInput intact.
In `@src/cuga/sdk_core/tests/test_context_summarization_sdk.py`:
- Around line 373-378: The assertion uses a permissive cap max(35,
len(large_history) // 3) which allows far too many messages post-summarization;
update the cap calculation in the test (both the occurrence around
message_count_before and the similar check near lines 820-825) to a tighter
bound such as max(KEEP_LAST_N_MESSAGES + 10, len(large_history) // 10) (or
max(keep_last_n + 10, len(history)//10) if different variable names) so the test
will fail for partial summarizers, and add a short inline comment referencing
KEEP_LAST_N_MESSAGES to explain the headroom for tool-call messages;
alternatively, if the intent is only to detect "no compression", add a comment
clarifying that and leave the original formula.
In `@src/frontend_workspaces/agentic_chat/src/FileAutocomplete.tsx`:
- Around line 175-309: Split the big useEffect into two effects: one (deps
[disabled, threadId]) that only runs loadWorkspaceFiles and manages fileInterval
(capture the interval ID and clear it on cleanup), and a second effect (deps
[disabled]) that encapsulates setupCarbonListeners and all DOM wiring; inside
that listener effect capture and clear any setTimeout IDs, store references to
added listeners (e.g., handleCarbonInput, handleDocumentInput) and to elements
you attach per-element listeners to (those marked with
data-autocomplete-listener) so you can removeEventListener for each on cleanup,
and ensure the retry setTimeout created by setupCarbonListeners is cleared if
the effect unmounts; update references to loadWorkspaceFiles,
setupCarbonListeners, handleCarbonInput, handleDocumentInput and fileInterval in
the respective effects.
In `@src/frontend_workspaces/frontend/src/api.ts`:
- Around line 266-285: getWorkspaceTree currently builds its querystring with
encodeURIComponent manually; update getWorkspaceTree to use URLSearchParams
(like getWorkspaceFile and getWorkspaceDownload) to construct the thread_id
parameter for consistency and correctness—modify the getWorkspaceTree function
to create a new URLSearchParams({ thread_id: threadId }) when threadId is
provided and append its toString() to `/api/workspace/tree` accordingly.
In `@src/frontend_workspaces/frontend/src/ChatLanding.tsx`:
- Around line 681-685: The polling interval in ChatLanding.tsx's useEffect
currently calls fetchWorkspaceTree every 2500ms which is below
workspaceService.getWorkspaceTree's 3000ms throttle; update the effect to either
raise the interval to 3000ms (setInterval(fetchWorkspaceTree, 3000)) or modify
fetchWorkspaceTree to call workspaceService.getWorkspaceTree({ forceRefresh:
true }) so requests are not stalled by the service throttle; ensure you only
change the interval or add the forceRefresh flag and keep the existing
clearInterval cleanup.
In `@src/frontend_workspaces/frontend/src/ManagePage.tsx`:
- Around line 1561-1572: The onClick handler for the expand toggle uses a
ternary purely for side effects (isExpanded ? next.delete(skill.name) :
next.add(skill.name)) which can trigger linter errors; in the setExpandedSkills
updater inside the button onClick, replace the ternary with an explicit if/else:
create next as new Set(prev), then if (isExpanded) { next.delete(skill.name); }
else { next.add(skill.name); } and finally return next. This change should be
made in the inline handler passed to setExpandedSkills (the function referencing
isExpanded and skill.name).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
| "markdownify", | ||
| "uvicorn", | ||
| "fastapi[standard]", | ||
| "apscheduler>=3.10", |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, find where APScheduler is used in the codebase
rg -i "apscheduler|from.*scheduler|import.*scheduler" --type py -A 3 -B 1 | head -100Repository: cuga-project/cuga-agent
Length of output: 3235
🌐 Web query:
APScheduler 4.0 release breaking changes migration guide
💡 Result:
APScheduler 4.0 is still in pre-release (latest 4.0.0a6 released April 27, 2025), explicitly marked as not for production use due to potential backwards incompatible changes without migration pathways [1][2][3][4]. The stable version remains 3.11.2 (Dec 22, 2025) [3][4][5]. Official migration guide from 3.x to 4.0 (https://apscheduler.readthedocs.io/en/master/migration.html) [6]: - APScheduler 4.0 is a partial rewrite [6][7]. - Time zone support revamped to use zoneinfo (Python 3.9+), drop pytz [6][1]. - add_job renamed to add_schedule; add_job now for one-off runs [6][7]. - BlockingScheduler and BackgroundScheduler combined into Scheduler with run_until_stopped (replaces BlockingScheduler) and start_in_background (replaces BackgroundScheduler) [6]. - AsyncIOScheduler replaced by generic AsyncScheduler (AnyIO-based, supports asyncio/Trio); use as async context manager [6]. - Dropped other schedulers (Qt, Twisted, etc.); some may return before final [6]. - No multiple data stores per scheduler [6]. - configure method removed; pass kwargs to scheduler constructor [6]. - No automatic import from 3.x job stores yet; planned before final release [6][7]. Progress tracking mentions v3->v4 migration tool in development [8]. Key breaking changes from alpha releases (not exhaustive, as pre-release) [9][1][10][11][12]: - Data store methods refactored (e.g., extend_acquired_schedule_leases, cleanup changes, release_schedules takes ScheduleResult) [9][1]. - Task/Schedule attributes read-only; metadata field added [1][10]. - Removed job_executor/max_running_jobs from add_schedule; use configure_task or @task decorator; task_defaults replaces default_job_executor [1][10]. - AsyncpgEventBroker takes connection string directly [9]. - lock_expiration_delay -> lease_duration [1]. - CronTrigger: timezone arg to from_crontab; Sunday=0 [1]. - Dropped Python <3.8, some job stores (Redis, etc.) [1]. Full version history: https://apscheduler.readthedocs.io/en/master/versionhistory.html [1]. GitHub releases: https://github.com/agronholm/apscheduler/releases [10]. Do not use 4.0 in production [1][2].
Citations:
- 1: https://apscheduler.readthedocs.io/en/master/versionhistory.html
- 2: https://pypi.org/project/APScheduler/4.0.0a4/
- 3: https://pypi.python.org/pypi/APScheduler
- 4: https://pypi.org/project/APScheduler/
- 5: https://github.com/agronholm/apscheduler
- 6: https://apscheduler.readthedocs.io/en/master/migration.html
- 7: https://apscheduler.readthedocs.io/en/master/_sources/migration.rst.txt
- 8: APScheduler 4.0 progress tracking agronholm/apscheduler#465
- 9: https://github.com/agronholm/apscheduler/releases/tag/4.0.0a6
- 10: https://github.com/agronholm/apscheduler/releases
- 11: https://github.com/agronholm/apscheduler/releases/tag/4.0.0a3
- 12: https://github.com/agronholm/apscheduler/releases/tag/4.0.0a4
🌐 Web query:
APScheduler 4.0 release date stable version 2025 2026
💡 Result:
There is no stable release of APScheduler 4.0 as of 2026-05-07. The latest release is 4.0.0a6 (alpha), published on 2025-04-27 [1][2]. Official documentation explicitly warns that the v4.0 series is a pre-release not suitable for production, with potential backwards-incompatible changes and no migration path [3][4]. The current stable version is APScheduler 3.11.2, released on 2025-12-22 [5]. Development on 4.0 continues slowly; the maintainer intends to finish it, including a v3-to-v4 migration tool required for beta, but no beta or stable 4.0 has been released yet. The 4.0 milestone remains open with some issues unresolved [6][7].
Citations:
- 1: https://github.com/agronholm/apscheduler/releases/tag/4.0.0a6
- 2: https://newreleases.io/project/pypi/APScheduler/release/4.0.0a6
- 3: https://apscheduler.readthedocs.io/en/master/versionhistory.html
- 4: https://pypi.org/project/APScheduler/4.0.0a4/
- 5: https://pypi.org/project/APScheduler/
- 6: APScheduler 4.0 progress tracking agronholm/apscheduler#465
- 7: https://github.com/agronholm/apscheduler/milestone/1
🏁 Script executed:
# Verify actual add_job usage in codebase
rg "add_job|add_schedule" --type py -B 2 -A 2Repository: cuga-project/cuga-agent
Length of output: 442
Pin APScheduler to <4 to prevent a future breaking upgrade.
APScheduler 4.x (currently in alpha at 4.0.0a6) is a ground-up rewrite with breaking changes to core APIs used in this codebase. Specifically, AsyncIOScheduler is being removed and replaced with AsyncScheduler (which uses async context managers), and add_job() is being renamed to add_schedule(). The codebase directly uses both APIs: importing AsyncIOScheduler from apscheduler.schedulers.asyncio and calling scheduler.add_job() in the loops service. The current constraint >=3.10 will silently upgrade to 4.x once its first stable release lands, causing runtime failures.
🛠️ Proposed fix
- "apscheduler>=3.10",
+ "apscheduler>=3.10,<4",🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pyproject.toml` at line 21, Update the APScheduler dependency constraint in
pyproject.toml to prevent automatic upgrade to breaking 4.x by changing the
current package spec "apscheduler>=3.10" to a bounded range that excludes 4.x
(e.g., "apscheduler>=3.10,<4"); this ensures existing usages of AsyncIOScheduler
and scheduler.add_job() remain compatible until you intentionally migrate to the
APScheduler v4 API.
| loop = asyncio.get_event_loop() | ||
| def _do_request(): | ||
| with urllib.request.urlopen(req, timeout={timeout_seconds}) as resp: | ||
| return resp.read().decode() | ||
| response_data = await loop.run_in_executor(None, _do_request) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, check if the file exists and examine the relevant lines
if [ -f "src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/common/call_api_helper.py" ]; then
echo "=== File found, examining lines around 194-198 ==="
sed -n '180,220p' "src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/common/call_api_helper.py" | cat -n
else
echo "File not found at expected path"
find . -name "call_api_helper.py" -type f 2>/dev/null | head -5
fiRepository: cuga-project/cuga-agent
Length of output: 1911
🏁 Script executed:
# Check Python version constraints in the project
if [ -f "pyproject.toml" ]; then
echo "=== pyproject.toml Python version ==="
grep -A 5 "python" pyproject.toml | head -15
fi
if [ -f "setup.py" ]; then
echo "=== setup.py Python version ==="
grep -i "python_requires" setup.py
fi
# Check for settings.toml mentioned in the scratchpad
if [ -f "settings.toml" ]; then
cat settings.toml
fiRepository: cuga-project/cuga-agent
Length of output: 482
🏁 Script executed:
# Search for the call_api function definition to verify async context
rg "def call_api" --context 10 "src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/common/"Repository: cuga-project/cuga-agent
Length of output: 4851
🌐 Web query:
asyncio.get_event_loop() deprecation Python 3.10 async context
💡 Result:
asyncio.get_event_loop is deprecated in Python 3.10+ when called without a running event loop, emitting DeprecationWarning[1][2][3]. In Python 3.14, it raises RuntimeError if no current event loop is set (default policy change)[2][4][5]. The policy system is deprecated in 3.14 and slated for removal in 3.16[6][5]. In async contexts (coroutines/callbacks), prefer asyncio.get_running_loop, which requires a running loop and has simpler behavior[1][7][2]. For top-level synchronous code, use high-level asyncio.run(coro) instead of manually getting/creating loops[1][8][9][10]. Example: async def main: # async code here pass asyncio.run(main) For multiple calls in the same loop/context (Python 3.12+), use asyncio.Runner[11][10][12]: with asyncio.Runner as runner: runner.run(coro1) runner.run(coro2) If needing a loop reference before running (rare), use asyncio.new_event_loop then loop.run_until_complete, but Runner or run handle shutdown better[13][14][9]. As of 2026-05-07 (post-3.14), avoid get_event_loop in new code; migrate to run/Runner/get_running_loop as appropriate[15][16][9].
Citations:
- 1: https://docs.python.org/3.10/library/asyncio-eventloop.html
- 2: https://docs.python.org/3/library/asyncio-eventloop.html
- 3: https://docs.python.org/3.12/library/asyncio-eventloop.html
- 4: https://docs.python.org/3.14/library/asyncio-eventloop.html
- 5: https://docs.python.org/3.14/library/asyncio-policy.html
- 6: Deprecate asyncio policy system python/cpython#127949
- 7: https://docs.python.org/3.11/library/asyncio-eventloop.html
- 8: https://blog.teclado.com/changes-to-async-event-loops-in-python-3-10/
- 9: recommend what to use instead of asyncio.get_event_loop in 3.14 what's new python/cpython#127174
- 10: https://docs.python.org/3.14/library/asyncio-runner.html
- 11: https://docs.python.org/library/asyncio-runner.html
- 12: https://docs.python.org/3.12/library/asyncio-runner.html
- 13: asyncio docs: Replace deprecated use of get_event_loop() with new_event_loop() python/cpython#96706
- 14: https://stackoverflow.com/questions/78577538/replacement-for-deprecated-asyncio-get-event-loop
- 15: Removal of implicit event loop creation in
asyncio.get_event_loop()breaks batteries-included promise python/cpython#149160 - 16: Remove implicit creation of event loop from
asyncio.get_event_looppython/cpython#126353
🌐 Web query:
asyncio.get_running_loop() vs get_event_loop() inside coroutine Python
💡 Result:
Inside a coroutine, use asyncio.get_running_loop instead of asyncio.get_event_loop. The official Python documentation explicitly recommends this: "Because this function has rather complex behavior (especially when custom event loop policies are in use), using the get_running_loop function is preferred to get_event_loop in coroutines and callbacks."[1] Key differences: - asyncio.get_running_loop: Returns the running event loop in the current OS thread. Raises RuntimeError if no running event loop. Can only be called from a coroutine or callback.[1][2] - asyncio.get_event_loop: Gets the current event loop, which when called from a coroutine returns the running loop (like get_running_loop), but outside it may create/set a new one based on policy. Deprecated since Python 3.10 with warnings, and in 3.14 raises RuntimeError without current loop; policy system deprecated, removal in 3.16.[1][2][3] This makes get_running_loop more predictable and safer inside coroutines, as it fails explicitly if no loop is running, preventing subtle bugs from implicit loop creation.[1][4] Example: async def my_coroutine: loop = asyncio.get_running_loop # Preferred[1] # vs loop = asyncio.get_event_loop # Deprecated/complex[1]
Citations:
- 1: https://docs.python.org/3/library/asyncio-eventloop.html
- 2: https://stackoverflow.com/questions/65206110/when-to-use-asyncio-get-running-loop-vs-asyncio-get-event-loop
- 3: Removal of implicit event loop creation in
asyncio.get_event_loop()breaks batteries-included promise python/cpython#149160 - 4: https://runebook.dev/en/docs/python/library/asyncio-eventloop/asyncio.get_event_loop
Use asyncio.get_running_loop() instead of asyncio.get_event_loop() inside the async function.
The generated code calls get_event_loop() within an async def call_api() coroutine. Since Python 3.10, get_event_loop() is deprecated when called without an active event loop and will emit a DeprecationWarning in your project's supported version range. Inside a coroutine, get_running_loop() is the correct idiom — it explicitly returns the running loop and raises RuntimeError if called outside a running context, preventing subtle bugs from policy-based implicit loop creation.
🛠️ Proposed fix
- loop = asyncio.get_event_loop()
+ loop = asyncio.get_running_loop()
def _do_request():
with urllib.request.urlopen(req, timeout={timeout_seconds}) as resp:
return resp.read().decode()
response_data = await loop.run_in_executor(None, _do_request)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| loop = asyncio.get_event_loop() | |
| def _do_request(): | |
| with urllib.request.urlopen(req, timeout={timeout_seconds}) as resp: | |
| return resp.read().decode() | |
| response_data = await loop.run_in_executor(None, _do_request) | |
| loop = asyncio.get_running_loop() | |
| def _do_request(): | |
| with urllib.request.urlopen(req, timeout={timeout_seconds}) as resp: | |
| return resp.read().decode() | |
| response_data = await loop.run_in_executor(None, _do_request) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/common/call_api_helper.py`
around lines 194 - 198, Inside the async function call_api, replace
asyncio.get_event_loop() with asyncio.get_running_loop() to avoid deprecated
implicit loop creation; obtain the running loop into the loop variable used for
loop.run_in_executor(None, _do_request), keeping the _do_request closure that
calls urllib.request.urlopen(req, timeout={timeout_seconds}) intact so the
blocking I/O is still executed in the executor.
| async def release_sandbox(self, thread_id: Optional[str] = None) -> None: | ||
| """Kill and remove the cached interpreter/sandbox for a thread.""" | ||
| key = thread_id or "_default" | ||
| interpreter = self._sandboxes.pop(key, None) | ||
| if interpreter: | ||
| try: | ||
| await interpreter.sandbox.kill() | ||
| await interpreter.sandbox.close() | ||
| except Exception: | ||
| pass |
There was a problem hiding this comment.
release_sandbox(None) only removes the "_default" entry, not all sandboxes
key = thread_id or "_default"
interpreter = self._sandboxes.pop(key, None)When called without arguments (e.g., during application shutdown), this only pops the single "_default" key; all per-thread sandboxes remain cached and their resources are leaked. Additionally, the except Exception: pass silently swallows kill/close errors with no log output.
🛠️ Proposed fix
async def release_sandbox(self, thread_id: Optional[str] = None) -> None:
"""Kill and remove the cached interpreter/sandbox for a thread.
- Pass thread_id=None to release the default sandbox.
+ Pass thread_id=None to release ALL cached sandboxes.
"""
- key = thread_id or "_default"
- interpreter = self._sandboxes.pop(key, None)
- if interpreter:
+ keys = [thread_id or "_default"] if thread_id is not None else list(self._sandboxes.keys())
+ for key in keys:
+ interpreter = self._sandboxes.pop(key, None)
+ if not interpreter:
+ continue
try:
await interpreter.sandbox.kill()
await interpreter.sandbox.close()
except Exception:
- pass
+ logger.debug(f"[OpenSandboxExecutor] Error during sandbox release for key={key!r}")🧰 Tools
🪛 Ruff (0.15.12)
[error] 222-223: try-except-pass detected, consider logging the exception
(S110)
[warning] 222-222: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/opensandbox/opensandbox_executor.py`
around lines 214 - 223, The release_sandbox implementation only pops "_default"
when thread_id is None and silently swallows errors; update release_sandbox to,
if thread_id is None, iterate over and remove all entries from self._sandboxes
(e.g., copy keys or items, pop each), await interpreter.sandbox.kill() and
interpreter.sandbox.close() for each cached interpreter, and on exceptions log
the error instead of using a bare except (use the class/module logger or
self.logger) while continuing to attempt cleanup for remaining sandboxes.
| data: bytes = await interpreter.sandbox.files.read_bytes(sandbox_path) | ||
| dest_name = filename or Path(sandbox_path).name | ||
| local_path = _workspace_dir() / dest_name | ||
| local_path.write_bytes(data) |
There was a problem hiding this comment.
Path traversal in download_file via unsanitised filename parameter
dest_name = filename or Path(sandbox_path).name
local_path = _workspace_dir() / dest_name
local_path.write_bytes(data)Path(sandbox_path).name is safe (returns basename only). However, the caller-supplied filename is not sanitised: a value like "../../../home/user/.ssh/authorized_keys" causes _workspace_dir() / filename to resolve outside cuga_workspace/, writing arbitrary files on the host filesystem.
🔒 Proposed fix
- dest_name = filename or Path(sandbox_path).name
+ # Strip any directory components to prevent path traversal
+ raw_name = filename or Path(sandbox_path).name
+ dest_name = Path(raw_name).name # basename only
+ if not dest_name or dest_name in (".", ".."):
+ return f"[download_file error] Invalid filename: {raw_name!r}"
local_path = _workspace_dir() / dest_name🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/opensandbox/opensandbox_executor.py`
around lines 282 - 285, The download_file flow uses the caller-supplied filename
directly which allows path traversal; change dest_name to a sanitized basename
(e.g. dest_name = Path(filename).name or run a secure filename sanitizer) and if
that yields an empty or unsafe name fall back to Path(sandbox_path).name; then
compute target = (_workspace_dir() / dest_name).resolve() and validate that
target is inside _workspace_dir().resolve() (use Path.is_relative_to or compare
commonpath), rejecting the request or raising an error if the check fails before
calling local_path.write_bytes(data); reference symbols: filename, dest_name,
_workspace_dir(), sandbox_path, local_path, and
interpreter.sandbox.files.read_bytes.
| entries = [ | ||
| FileEntry( | ||
| name=Path(e.path).name, | ||
| path=e.path, | ||
| is_dir=(e.size == 0 and str(oct(e.mode)).startswith("0o7")), | ||
| size_bytes=e.size, | ||
| ) | ||
| for e in entries_raw | ||
| ] | ||
| result = ListFilesResult(sandbox_path=sandbox_path, entries=entries) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -type f -name "opensandbox_executor.py" | head -5Repository: cuga-project/cuga-agent
Length of output: 159
🏁 Script executed:
wc -l ./src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/opensandbox/opensandbox_executor.pyRepository: cuga-project/cuga-agent
Length of output: 163
🏁 Script executed:
cat -n ./src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/opensandbox/opensandbox_executor.py | sed -n '340,375p'Repository: cuga-project/cuga-agent
Length of output: 1817
🏁 Script executed:
grep -r "opensandbox" ./src --include="*.py" | grep -E "(import|from)" | head -20Repository: cuga-project/cuga-agent
Length of output: 1442
🏁 Script executed:
find . -type f \( -name "pyproject.toml" -o -name "requirements.txt" -o -name "setup.py" \) -exec grep -l "opensandbox" {} \;Repository: cuga-project/cuga-agent
Length of output: 84
🏁 Script executed:
cat ./pyproject.toml | grep -A 5 -B 5 "opensandbox"Repository: cuga-project/cuga-agent
Length of output: 697
🏁 Script executed:
grep -r "SearchEntry\|e.mode" ./src --include="*.py" -B 3 -A 3 | head -50Repository: cuga-project/cuga-agent
Length of output: 8145
🏁 Script executed:
cat -n ./src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/opensandbox/opensandbox_executor.py | sed -n '1,50p'Repository: cuga-project/cuga-agent
Length of output: 2138
🏁 Script executed:
grep -r "is_dir\|stat\|mode" ./src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/opensandbox/opensandbox_executor.py -B 2 -A 2Repository: cuga-project/cuga-agent
Length of output: 2365
🏁 Script executed:
find . -type f -name "*.py" -path "*/test*" -o -name "*test*.py" | xargs grep -l "SearchEntry\|opensandbox" 2>/dev/null | head -5Repository: cuga-project/cuga-agent
Length of output: 49
🏁 Script executed:
rg "SearchEntry" --type py -B 5 -A 10Repository: cuga-project/cuga-agent
Length of output: 2694
🌐 Web query:
OpenSandbox SearchEntry e.mode st_mode filesystem API
💡 Result:
OpenSandbox is a secure sandbox runtime platform by Alibaba for AI agents, featuring a filesystem API for file operations within sandboxes. The SearchEntry is a Python SDK model used for file search operations. In the OpenSandbox Python SDK, SearchEntry defines parameters for the /files/search endpoint, which searches for files matching a glob pattern in a directory tree. Example usage: from opensandbox.models.filesystem import SearchEntry files = await sandbox.files.search( SearchEntry( path="/tmp", pattern=".txt")) The search returns an array of FileInfo objects, where each FileInfo includes a 'mode' field representing Unix file permissions in octal format (e.g., 644 for rw-r--r--), equivalent to the st_mode field from Unix stat without the S_IFMT type bits. Example FileInfo: { "path": "/workspace/main.py", "size": 1024, "modified_at": "2025-11-16T14:30:45Z", "owner": "admin", "group": "admin", "mode": 644 }[1][2] The 'mode' (e.mode in object access) directly corresponds to st_mode & ~S_IFMT, providing permission bits (e.g., 644, 755). This is used in file metadata from search, info, and write operations (WriteEntry also takes mode).[3][4] The API endpoint is GET /files/search?path=/workspace&pattern=/.py, returning FileInfo[] with mode.[1][5]
Citations:
- 1: https://www.mintlify.com/alibaba/OpenSandbox/api/execution/filesystem
- 2: https://open-sandbox.ai/sdks/sandbox/python/readme
- 3: https://mintlify.com/alibaba/OpenSandbox/examples/file-operations
- 4: https://github.com/alibaba/OpenSandbox/blob/main/docs/architecture.md
- 5: https://github.com/alibaba/OpenSandbox/blob/main/specs/README.md
🌐 Web query:
OpenSandbox FileInfo directory detection size mode 0o40000
💡 Result:
In OpenSandbox, the FileInfo object returned by the /files/info API endpoint (via GET request to execd on port 44772) contains file metadata including the 'mode' field, which is the Unix file permissions in octal format (e.g., 644, 755) [1][2]. This mode value is obtained from os.Stat or syscall.Stat_t in the Go implementation within components/execd/pkg/web/controller/utils.go's GetFileInfo function [3]. Directories are detected by checking if the full st_mode has the directory bit set, which is 0o40000 (octal 040000, Linux S_IFDIR). This is the standard Unix way: mode & 0o170000 == 0o40000 indicates a directory [4][5]. For directories, the reported 'mode' includes this bit (e.g., 040755 for drwxr-xr-x), unlike examples showing regular files without it [1]. The size for directories is typically 0x4000 (16384 bytes) in some Unix stat implementations due to internal representation, but OpenSandbox reports actual size from stat [3].
Citations:
- 1: https://www.mintlify.com/alibaba/OpenSandbox/api/execution/filesystem
- 2: https://github.com/alibaba/OpenSandbox/blob/9258f4f7/specs/execd-api.yaml
- 3: https://deepwiki.com/alibaba/OpenSandbox/4.2.3-file-operations
- 4: https://stackoverflow.com/questions/56098463/how-works-pythons-stat-module
- 5: https://github.com/torvalds/linux/blob/master/include/uapi/linux/stat.h
🏁 Script executed:
python3 << 'EOF'
# Verify the logic of the heuristic with example modes
import stat
import os
# Example directory mode (0o40755 = drwxr-xr-x)
dir_mode = 0o40755
print(f"Directory mode: {oct(dir_mode)}")
print(f" Current heuristic check (size==0 and str(oct(mode)).startswith('0o7')): {str(oct(dir_mode)).startswith('0o7')}")
print(f" Correct check stat.S_ISDIR(): {stat.S_ISDIR(dir_mode)}")
print(f" Correct check (mode & 0o170000) == 0o040000: {(dir_mode & 0o170000) == 0o040000}")
print()
# Example regular file mode (0o100644 = -rw-r--r--)
file_mode = 0o100644
print(f"Regular file mode: {oct(file_mode)}")
print(f" Current heuristic check (size==0 and str(oct(mode)).startswith('0o7')): {str(oct(file_mode)).startswith('0o7')}")
print(f" Correct check stat.S_ISDIR(): {stat.S_ISDIR(file_mode)}")
print(f" Correct check (mode & 0o170000) == 0o040000: {(file_mode & 0o170000) == 0o040000}")
print()
# Example executable file (0o100755 = -rwxr-xr-x)
exec_mode = 0o100755
print(f"Executable file mode: {oct(exec_mode)}")
print(f" Current heuristic check (size==0 and str(oct(mode)).startswith('0o7')): {str(oct(exec_mode)).startswith('0o7')}")
print(f" Correct check stat.S_ISDIR(): {stat.S_ISDIR(exec_mode)}")
print(f" Correct check (mode & 0o170000) == 0o040000: {(exec_mode & 0o170000) == 0o040000}")
EOFRepository: cuga-project/cuga-agent
Length of output: 663
🏁 Script executed:
grep -n "^import\|^from" ./src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/opensandbox/opensandbox_executor.py | head -20Repository: cuga-project/cuga-agent
Length of output: 448
Fix directory detection logic—current heuristic misclassifies all directories as files
The current check is_dir=(e.size == 0 and str(oct(e.mode)).startswith("0o7")) fails for OpenSandbox's FileInfo.mode, which includes the full Unix st_mode with directory bit (0o40000). For a typical directory with mode 0o40755:
oct(0o40755)→'0o40755'→.startswith("0o7")is False
This causes all directories to be incorrectly reported as files. Replace with:
Fix
+import stat
...
entries = [
FileEntry(
name=Path(e.path).name,
path=e.path,
- is_dir=(e.size == 0 and str(oct(e.mode)).startswith("0o7")),
+ is_dir=stat.S_ISDIR(e.mode),
size_bytes=e.size,
)
for e in entries_raw
]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| entries = [ | |
| FileEntry( | |
| name=Path(e.path).name, | |
| path=e.path, | |
| is_dir=(e.size == 0 and str(oct(e.mode)).startswith("0o7")), | |
| size_bytes=e.size, | |
| ) | |
| for e in entries_raw | |
| ] | |
| result = ListFilesResult(sandbox_path=sandbox_path, entries=entries) | |
| entries = [ | |
| FileEntry( | |
| name=Path(e.path).name, | |
| path=e.path, | |
| is_dir=stat.S_ISDIR(e.mode), | |
| size_bytes=e.size, | |
| ) | |
| for e in entries_raw | |
| ] | |
| result = ListFilesResult(sandbox_path=sandbox_path, entries=entries) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/opensandbox/opensandbox_executor.py`
around lines 353 - 362, The directory-detection heuristic in the list
comprehension that builds FileEntry from entries_raw is wrong: is_dir=(e.size ==
0 and str(oct(e.mode)).startswith("0o7")) misclassifies directories because
FileInfo.mode contains full st_mode (e.g., 0o40755). Replace that heuristic in
the FileEntry construction with a proper POSIX mode test (use the stat helpers
against e.mode, e.g., stat.S_ISDIR(e.mode) or compare e.mode & stat.S_IFMT to
stat.S_IFDIR) so FileEntry.is_dir correctly reflects directories when creating
ListFilesResult(sandbox_path=..., entries=...).
| # Set session.id for OpenLit observability (if enabled) | ||
| set_session_attribute(thread_id) | ||
| self._set_loops_context(thread_id) |
There was a problem hiding this comment.
Seed loops context in stream() too, not only invoke().
The new tools rely on current_agent_name/current_thread_id/current_app_name, but those contextvars are only set in invoke(). A scheduling tool called while using agent.stream(...) will hit _identity() and fail even though the tools were auto-injected successfully.
Suggested fix
# In both stream() branches, right after set_session_attribute(thread_id)
set_session_attribute(thread_id)
+self._set_loops_context(thread_id)Also applies to: 1969-1971
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/cuga/sdk.py` around lines 1834 - 1836, stream() currently doesn't seed
the loops/context vars so tools calling _identity() fail; mirror the invoke()
behavior by ensuring stream() computes the same thread_id and calls
set_session_attribute(thread_id) and self._set_loops_context(thread_id) before
yielding/dispatching tool calls (the same calls used in invoke()), so
current_agent_name/current_thread_id/current_app_name are set for any scheduling
tool; apply the same change to the other stream() occurrence noted near the
second _set_loops_context usage.
| def _set_loops_context(self, thread_id: str) -> None: | ||
| if not self._loops_initialized or not self._enable_loops: | ||
| return |
There was a problem hiding this comment.
enable_loops=None currently disables supervisor loop tooling.
_ensure_loops_initialized() resolves the settings-backed default, but graph and _set_loops_context() still gate on raw self._enable_loops. In auto mode (None), the supervisor may register with the loops service while the graph is compiled with enable_loops=False and loop context never binds.
Suggested fix
+def _loops_enabled(self) -> bool:
+ if self._enable_loops is not None:
+ return self._enable_loops
+ from cuga.config import settings as _cuga_settings
+ return bool(getattr(getattr(_cuga_settings, "loops", None), "enabled", False))
+
def _set_loops_context(self, thread_id: str) -> None:
- if not self._loops_initialized or not self._enable_loops:
+ if not self._loops_initialized or not self._loops_enabled():
return
# ...
supervisor_subgraph = create_cuga_supervisor_graph(
supervisor_model=self._model,
agents=self._agents,
- enable_loops=bool(self._enable_loops),
+ enable_loops=self._loops_enabled(),
)Also applies to: 2454-2458
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/cuga/sdk.py` around lines 2403 - 2405, The guard in _set_loops_context
currently uses the raw self._enable_loops which can be None; instead use the
resolved flag produced by _ensure_loops_initialized (call or property set by it)
so auto-mode is handled correctly — update _set_loops_context (and the similar
gating around graph compilation at the other site) to call
_ensure_loops_initialized() first (if not already initialized) and then test the
resolved boolean (the attribute set by _ensure_loops_initialized or its return
value) rather than self._enable_loops so loop context binding works in auto
(None) mode.
| decomposition_strategy = "flexible" # "exact" = one subtask per app, "flexible" = allows multiple subtasks per app | ||
| e2b_sandbox = false # use e2b for sandbox: | ||
| e2b_sandbox = false # use E2B cloud sandbox for code execution | ||
| opensandbox_sandbox = true # use OpenSandbox (Docker) for code execution; requires Docker + opensandbox SDK |
There was a problem hiding this comment.
opensandbox_sandbox = true as the default will break existing installs that lack Docker/OpenSandbox.
Any user who upgrades without Docker installed — or without the opensandbox extra — will get a hard runtime failure the first time code execution is invoked. The previous default was false.
Consider keeping the default false and requiring explicit opt-in (as the e2b_sandbox flag does), or at least guard the executor import so a clear, actionable error is surfaced instead of a raw ModuleNotFoundError/ConnectionError.
🛠️ Suggested change
-opensandbox_sandbox = true # use OpenSandbox (Docker) for code execution; requires Docker + opensandbox SDK
+opensandbox_sandbox = false # use OpenSandbox (Docker) for code execution; requires Docker + opensandbox SDK📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| opensandbox_sandbox = true # use OpenSandbox (Docker) for code execution; requires Docker + opensandbox SDK | |
| opensandbox_sandbox = false # use OpenSandbox (Docker) for code execution; requires Docker + opensandbox SDK |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/cuga/settings.toml` at line 39, The config default enables OpenSandbox by
setting opensandbox_sandbox = true which will break installs without
Docker/opensandbox; change the default back to false (match previous behavior
like e2b_sandbox) and/or add a runtime guard around the OpenSandbox executor
import/initialization (where OpenSandbox executor or related class is
referenced) so that if the opensandbox_sandbox flag is true but the opensandbox
package or Docker connection is missing you raise a clear, actionable error (or
fall back) instead of letting ModuleNotFoundError/ConnectionError propagate.
| [loops] | ||
| enabled = false # Master toggle for CUGA loops (agent self-scheduling). | ||
| # When false, the schedule_recurring / schedule_wakeup tools | ||
| # are not injected and the scheduler is not started, even if | ||
| # an agent passes enable_loops=True (the explicit per-agent | ||
| # flag overrides this default). | ||
| default_expires_days = 7 # Default expiry for scheduled loops if the agent does not specify. |
There was a problem hiding this comment.
Self-contradictory comment in the [loops] master toggle.
The comment says: "When false, the tools are not injected and the scheduler is not started, even if an agent passes enable_loops=True (the explicit per-agent flag overrides this default)." Those two clauses contradict each other — if enabled = false always suppresses injection, then no per-agent flag overrides it; if a per-agent flag can override it, then the "even if" clause is wrong. Please clarify which semantic is actually implemented.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/cuga/settings.toml` around lines 217 - 223, Investigate the implemented
behavior for the [loops] master toggle and then update the comment for the
enabled field to match it exactly: confirm whether enabled = false globally
prevents injection/starting of the scheduler or whether a per-agent
enable_loops=True can override it, and then rewrite the sentence to state that
behavior unambiguously (referencing enabled, enable_loops, and the tools
schedule_recurring / schedule_wakeup so readers know which features are
affected); ensure the comment no longer contains the contradictory "even if"
clause and clearly states the precedence rule (master toggle vs per-agent flag).
| useEffect(() => { | ||
| workspaceTreeDirPathsPrevRef.current = new Set(); | ||
| setWorkspaceExpandedDirs(new Set()); | ||
| setWorkspaceTree([]); | ||
| setWorkspaceTreeLoading(true); | ||
| }, [effectiveChatThreadId]); | ||
|
|
||
| useEffect(() => { | ||
| const valid = collectDirectoryPaths(workspaceTree); | ||
| const prevValid = workspaceTreeDirPathsPrevRef.current; | ||
| setWorkspaceExpandedDirs((expanded) => { | ||
| const next = new Set<string>(); | ||
| for (const p of expanded) { | ||
| if (valid.has(p)) next.add(p); | ||
| } | ||
| for (const p of valid) { | ||
| if (!prevValid.has(p)) next.add(p); | ||
| } | ||
| return next; | ||
| }); | ||
| workspaceTreeDirPathsPrevRef.current = valid; | ||
| }, [workspaceTree]); |
There was a problem hiding this comment.
Resetting workspaceTreeDirPathsPrevRef to empty causes every directory to auto-expand on the next tree update.
The pruning effect at Lines 694-708 treats anything in valid but not in prevValid as a newly added directory and adds it to the expanded set. After a thread switch (or first mount) the prev ref is the empty set, so the very next tree fetch marks every directory as "new" and expands all of them, including deeply nested ones. That defeats the user's collapsed-by-default expectation and can be heavy on large workspaces.
Consider only auto-expanding directories that appear after the first successful tree load for a given thread, e.g. by tracking a hasInitialTree flag or seeding workspaceTreeDirPathsPrevRef.current = valid on the first non-empty payload without expanding any of them.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/frontend_workspaces/frontend/src/ChatLanding.tsx` around lines 687 - 708,
The current reset of workspaceTreeDirPathsPrevRef in the effect that runs on
effectiveChatThreadId causes every directory to be treated as "new" and
auto-expanded on the next workspaceTree update; change the logic so we don't
seed prev ref to an empty set on thread switch but instead detect the first
successful tree load and initialize workspaceTreeDirPathsPrevRef.current = valid
without adding those paths to setWorkspaceExpandedDirs (or use a hasInitialTree
flag to skip adding new valid paths on the first non-empty payload). Update the
effects that reference workspaceTreeDirPathsPrevRef, workspaceTree,
setWorkspaceExpandedDirs and effectiveChatThreadId accordingly so only
directories genuinely added after the initial load are auto-expanded.
A new self-scheduling primitive that lets CUGA agents schedule themselves to wake up later
Cuga-App that uses Cuga Loops ->https://github.com/cuga-project/cuga-apps/pull/1/changes
Summary by CodeRabbit
New Features
Documentation
Configuration