Skip to content

fix: resolve workspace materialization race on Windows (#671)#690

Open
toshalkumbhar8979-design wants to merge 3 commits into
usestrix:mainfrom
toshalkumbhar8979-design:fix-windows-sandbox-race
Open

fix: resolve workspace materialization race on Windows (#671)#690
toshalkumbhar8979-design wants to merge 3 commits into
usestrix:mainfrom
toshalkumbhar8979-design:fix-windows-sandbox-race

Conversation

@toshalkumbhar8979-design

Copy link
Copy Markdown

Description

Fixes #671

The Bug:
When running multi-file local scans (strix --target <repo>) on Windows Docker Desktop, the TUI hangs during workspace materialization. This is caused by a concurrency race condition where docker exec latency on Windows leads to workers executing the RESOLVE_WORKSPACE_PATH_HELPER before it has been fully written, resulting in ExecTransportError or WorkspaceArchiveWriteError.

The Solution:
This PR bypasses the race condition by modifying strix/runtime/backends.py to manage SandboxConcurrencyLimits specifically for Windows environments.

  • OS Detection: Detects sys.platform == 'win32' and lowers the default concurrency limit to 1 (forcing serial execution) for manifest_entries and local_dir_files.
  • Fallback for Linux/WSL: Leaves the limit at the SDK default (None) for non-Windows environments to preserve speed.
  • Environment Override: Adds support for the STRIX_DOCKER_CONCURRENCY environment variable, allowing users to manually dial concurrency up or down regardless of their OS.

Testing:
Verified locally on Windows 11 / Docker Desktop. Multi-file targets now materialize cleanly without hanging the TUI or throwing archive errors.

@greptile-apps

greptile-apps Bot commented Jul 5, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR changes Docker workspace materialization and agent run configuration handling. The main changes are:

  • Adds Windows-specific Docker concurrency limiting.
  • Adds a STRIX_DOCKER_CONCURRENCY environment override.
  • Moves concurrency handling from backend creation to session startup.
  • Normalizes tool_choice before streamed agent runs.

Confidence Score: 4/5

These issues should be fixed before merging.

  • Docker backend startup can fail when concurrency limits are applied.
  • The Windows materialization race can still run without limits when the import path fails.
  • Configured tool-call behavior can still be changed for later cycles and child agents.

strix/runtime/backends.py and strix/core/execution.py

Important Files Changed

Filename Overview
strix/runtime/backends.py Adds Docker concurrency limit handling, but the SDK import and session startup paths can still break or skip the limit.
strix/core/execution.py Adds tool-choice normalization, but it can still overwrite configured string values on the shared run config.
Prompt To Fix All With AI
Fix the following 3 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 3
strix/runtime/backends.py:63-66
**Wrong import path** The concurrency limit type is still loaded from SDK surfaces that do not export it in the pinned API. On the default Windows path, both import attempts can fail before the Docker backend starts. If a Windows user sets `STRIX_DOCKER_CONCURRENCY` to another value, the same import failure is swallowed and the backend continues with `concurrency_limits=None`, so workspace materialization can still run concurrently.

### Issue 2 of 3
strix/runtime/backends.py:91
**Start rejects kwargs** Passing `concurrency_limits` to `session.start()` still targets the wrong SDK call. The Docker sandbox session `start()` method accepts no keyword arguments, so any path that successfully builds `start_kwargs` fails backend startup with an unexpected-keyword error before workspace materialization can run.

### Issue 3 of 3
strix/core/execution.py:357-358
**Tool choice overwritten** This branch still rewrites every string `run_config.tool_choice`, not only the Bedrock-incompatible `"auto"` value. When a caller configures `"required"` or `"none"`, the shared `RunConfig` is mutated to `{"type": "auto"}` before `Runner.run_streamed()`, so later cycles and child agents can run with automatic tool selection instead of the configured policy.

Reviews (3): Last reviewed commit: "fix: resolve workspace materialization r..." | Re-trigger Greptile

Comment thread strix/runtime/backends.py Outdated
Comment thread strix/runtime/backends.py Outdated
Comment thread strix/runtime/backends.py Outdated
Comment thread strix/core/execution.py
Comment thread strix/runtime/backends.py
Comment on lines +63 to +66
try:
from agents.sandbox.artifacts import SandboxConcurrencyLimits
except ImportError:
from agents.sandbox import SandboxConcurrencyLimits

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Wrong import path The concurrency limit type is still loaded from SDK surfaces that do not export it in the pinned API. On the default Windows path, both import attempts can fail before the Docker backend starts. If a Windows user sets STRIX_DOCKER_CONCURRENCY to another value, the same import failure is swallowed and the backend continues with concurrency_limits=None, so workspace materialization can still run concurrently.

Prompt To Fix With AI
This is a comment left during a code review.
Path: strix/runtime/backends.py
Line: 63-66

Comment:
**Wrong import path** The concurrency limit type is still loaded from SDK surfaces that do not export it in the pinned API. On the default Windows path, both import attempts can fail before the Docker backend starts. If a Windows user sets `STRIX_DOCKER_CONCURRENCY` to another value, the same import failure is swallowed and the backend continues with `concurrency_limits=None`, so workspace materialization can still run concurrently.

How can I resolve this? If you propose a fix, please make it concise.

Comment thread strix/runtime/backends.py
if concurrency_limits is not None:
start_kwargs["concurrency_limits"] = concurrency_limits

await session.start(**start_kwargs)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Start rejects kwargs Passing concurrency_limits to session.start() still targets the wrong SDK call. The Docker sandbox session start() method accepts no keyword arguments, so any path that successfully builds start_kwargs fails backend startup with an unexpected-keyword error before workspace materialization can run.

Prompt To Fix With AI
This is a comment left during a code review.
Path: strix/runtime/backends.py
Line: 91

Comment:
**Start rejects kwargs** Passing `concurrency_limits` to `session.start()` still targets the wrong SDK call. The Docker sandbox session `start()` method accepts no keyword arguments, so any path that successfully builds `start_kwargs` fails backend startup with an unexpected-keyword error before workspace materialization can run.

How can I resolve this? If you propose a fix, please make it concise.

Comment thread strix/core/execution.py
Comment on lines +357 to +358
elif hasattr(run_config, "tool_choice") and (run_config.tool_choice == "auto" or isinstance(run_config.tool_choice, str)):
run_config.tool_choice = {"type": "auto"}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Tool choice overwritten This branch still rewrites every string run_config.tool_choice, not only the Bedrock-incompatible "auto" value. When a caller configures "required" or "none", the shared RunConfig is mutated to {"type": "auto"} before Runner.run_streamed(), so later cycles and child agents can run with automatic tool selection instead of the configured policy.

Prompt To Fix With AI
This is a comment left during a code review.
Path: strix/core/execution.py
Line: 357-358

Comment:
**Tool choice overwritten** This branch still rewrites every string `run_config.tool_choice`, not only the Bedrock-incompatible `"auto"` value. When a caller configures `"required"` or `"none"`, the shared `RunConfig` is mutated to `{"type": "auto"}` before `Runner.run_streamed()`, so later cycles and child agents can run with automatic tool selection instead of the configured policy.

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Local-code scans fail on Windows Docker Desktop during sandbox workspace materialization (ExecTransportError / WorkspaceArchiveWriteError)

1 participant