-
Notifications
You must be signed in to change notification settings - Fork 25
Description
Summary
When the Copilot CLI headless server (spawned by PolyPilot) and a separate global Copilot CLI installation run concurrently, they share the same native module directory (~/.copilot/pkg/darwin-arm64/). The global CLI can clean up the version directory that the headless server loaded at startup, causing all subsequent shell spawns to fail with posix_spawn failed: No such file or directory.
This is a data race in the CLI's native module management that should be reported upstream to the Copilot CLI team.
Root Cause Analysis
The shared directory
Both CLI installations use ~/.copilot/pkg/darwin-arm64/{version}/prebuilds/darwin-arm64/ to store platform-specific native modules:
pty.node— terminal multiplexing (Node.js native addon)spawn-helper— binary used bypty.nodeto spawn shell processes viaposix_spawn()keytar.node— credential storage
The conflict
-
PolyPilot's bundled CLI (Mach-O binary, v1.0.6-0) starts the headless server. At startup, it extracts/loads native modules from
~/.copilot/pkg/darwin-arm64/1.0.2/. Thepty.nodefile is loaded into memory viadlopen(). -
Global Homebrew CLI (
/opt/homebrew/bin/copilot, Node.js-based) starts separately. It extracts its own version (0.0.420) to~/.copilot/pkg/darwin-arm64/0.0.420/and deletes the old1.0.2/directory as part of version cleanup. -
The headless server's
pty.nodeis still loaded in memory (Unix keeps open file descriptors valid after deletion), so the server process itself continues running fine. -
However, when
pty.nodetries to spawn a new shell, it callsposix_spawn()looking forspawn-helperat the original path~/.copilot/pkg/darwin-arm64/1.0.2/prebuilds/darwin-arm64/spawn-helper— which no longer exists on disk. -
All shell spawns fail with:
posix_spawn failed: No such file or directory
Evidence
# The headless server (started 8:18 AM) still has the deleted files loaded:
$ lsof -p 10381 | grep pkg
copilot 10381 user txt REG ... /Users/user/.copilot/pkg/darwin-arm64/1.0.2/prebuilds/darwin-arm64/pty.node
copilot 10381 user txt REG ... /Users/user/.copilot/pkg/darwin-arm64/1.0.2/prebuilds/darwin-arm64/keytar.node
# But the directory is gone from disk:
$ ls ~/.copilot/pkg/darwin-arm64/
0.0.420/ # only version remaining
# The 1.0.2 directory was cleaned up, taking spawn-helper with it
$ ls ~/.copilot/pkg/darwin-arm64/1.0.2
ls: No such file or directory
# Workers hit this error within ~14 seconds of dispatch:
posix_spawn failed: No such file or directory
Timeline
| Time | Event |
|---|---|
| 8:18 AM | PolyPilot starts headless server (PID 10381) → loads pty.node from darwin-arm64/1.0.2/ |
| 8:27 AM | Global CLI starts (copilot --yolo, PID 14029) → extracts darwin-arm64/0.0.420/, cleans up 1.0.2/ |
| Later | All new shell spawns from headless server fail — spawn-helper path is gone |
Impact
- Multi-agent orchestration completely broken — all workers fail immediately when they try to use shell tools
- Single sessions affected too — any session trying to run shell commands fails
- Silent failure — the server itself appears healthy (responds to API requests, creates sessions), but every tool execution that needs a shell dies
- No auto-recovery — the server must be killed and restarted to pick up the new native module path
Suggested Upstream Fix
The CLI's native module cleanup logic should:
- Check for open file handles before deleting version directories (e.g., check for
inuse.*.lockfiles or uselsof) - Use atomic replacement instead of delete-then-create (rename the directory rather than deleting it)
- Not clean up other versions' platform-specific directories — only manage its own version
- Or: use version-isolated paths that include the CLI binary's own version in the path, so different CLI versions never share native module directories
Alternatively, the spawn-helper path could be resolved relative to the loaded pty.node file descriptor rather than the original filesystem path, making it resilient to directory deletion.
PolyPilot Workarounds (Planned)
- Detect
posix_spawnerrors in worker results → auto-restart headless server → retry - Server health probe before orchestrator dispatch
- Possibly isolate native module directory to
~/.polypilot/pkg/if the CLI supports an env var for this
Environment
- macOS (Darwin, arm64)
- PolyPilot bundled CLI: Mach-O binary v1.0.6-0 (from GitHub.Copilot.SDK NuGet)
- Global CLI: Homebrew install, Node.js-based, v1.0.6-0
- Native module versions:
1.0.2(loaded by server) vs0.0.420(extracted by global CLI)
Related
- Workers report:
All Workers Failed — Shell Environment Broken - Error occurs within ~14 seconds of worker dispatch
- Resource limits are NOT the issue: 591/122880 FDs, 4 active sessions, 6 child processes