Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 10 additions & 4 deletions docs/plans/completed/codex-sandbox-state-meta-migration.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
## Status

- State: completed
- Last updated: 2026-04-18
- Last updated: 2026-04-19
- Current phase: completed

## Design Intent
Expand Down Expand Up @@ -69,8 +69,13 @@
- Ordered sandbox plans still validate earlier operations before later mode resets; later-wins resolution does not silently discard earlier invalid CLI/config ops.
- `--debug-repl --sandbox inherit` remains locally usable by bootstrapping one inherited snapshot from the current default sandbox state before the first worker spawn.
- `repl_reset` derives inherited sandbox state from the current tool call's `_meta["codex/sandbox-state-meta"]`.
- Non-empty `repl` calls derive inherited sandbox state from the current tool call's `_meta["codex/sandbox-state-meta"]` before executing fresh code.
- Empty-input `repl` polls ignore per-call sandbox metadata when they can be answered from existing state, but they still apply the current tool call's metadata before spawning a worker to answer an idle call on a fresh session.
- Non-empty `repl` calls resolve stale timeout markers before deciding whether they still belong to a prior timed-out request.
- Bare `Ctrl-C` is the one non-empty follow-up that remains a local recovery control and does not force a sandbox-driven restart.
- Every other non-empty `repl` call requires valid current `_meta["codex/sandbox-state-meta"]`.
- If current metadata changes the effective inherited sandbox, `mcp-repl` restarts the worker before handling that non-empty call and includes a reply notice naming the new sandbox policy.
- Control-prefixed tails such as `Ctrl-C<code>` and `Ctrl-D<code>` run in the restarted session when the sandbox changed; the control prefix itself is not replayed into the fresh worker.
- While the pager is active, pure pager navigation remains local UI state and ignores sandbox metadata until a later tool call actually interacts with the worker again.
- Empty-input `repl` polls ignore per-call sandbox metadata when they can be answered from existing state, but they still apply the current tool call's metadata before any spawn or respawn needed to answer the call, including after draining a session-ended request.
- When a prior timed-out request has already settled, `mcp-repl` resolves the stale timeout marker before deciding whether a new non-empty `repl` call is still just a busy follow-up.
- Missing or malformed metadata fails closed with the existing inherit error path.
- Explicit non-`inherit` sandbox modes ignore Codex metadata.
Expand All @@ -88,7 +93,7 @@

- Current Codex source and live traces both showed the old async update protocol was obsolete for the current release line.
- The migration stayed intentionally single-path: no compatibility layer for older Codex builds.
- Follow-up review fixes tightened the runtime sequencing so sandbox metadata is applied only for fresh execution or worker spawn, not for empty-input polls that are only draining prior output or using an already-running idle session.
- Follow-up review fixes and final contract clarification tightened the runtime sequencing so sandbox changes now define worker-session boundaries for non-empty calls: empty polls keep draining, bare interrupts stay local, and other non-empty interactions restart into the current inherited sandbox when it changed.

## Decision Log

Expand All @@ -97,3 +102,4 @@
- 2026-04-17: Chose per-tool-call `_meta["codex/sandbox-state-meta"]` as the source of truth after inspecting current Codex source and live traces.
- 2026-04-17: Completed the repo migration and verification against the real current Codex integration tests.
- 2026-04-18: Clarified the shipped contract for `repl`: empty-input polls ignore per-call sandbox metadata only when they can be answered from existing state, while fresh non-empty calls resolve stale timeout markers and then apply the current call's sandbox metadata before executing new code.
- 2026-04-19: Replaced the earlier local-follow-up exception set with a simpler restart-on-change contract: empty polls keep draining, bare interrupts remain local, and other non-empty interactions use current metadata and restart the worker when the inherited sandbox changed.
51 changes: 51 additions & 0 deletions docs/plans/completed/inherit-sandbox-restart-contract.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Inherit Sandbox Restart Contract

## Summary

- Change `--sandbox inherit` so new per-tool-call sandbox metadata takes effect by restarting the worker at the next non-poll, non-bare-interrupt interaction.
- Keep empty-input polls and bare `Ctrl-C` as the two cases that do not force a restart on sandbox change.
- Keep explicit non-`inherit` sandbox modes authoritative and unchanged.

## Status

- State: completed
- Last updated: 2026-04-19
- Current phase: completed

## Current Direction

- Treat sandbox changes as worker-session boundaries.
- If current tool-call metadata changes the effective inherited sandbox, restart the worker before handling any tool call that would otherwise send input, restart the worker, or otherwise interact with the worker statefully.
- Empty-input polls keep draining existing state without forcing a restart.
- A bare `Ctrl-C` remains a local recovery control and does not force a restart just because sandbox metadata changed.

## Long-Term Direction

- The inherit contract should stay simple enough to explain in one paragraph:
fresh worker interaction uses the current metadata, and sandbox changes reset the session before that interaction happens.
- Review-driven exceptions should be minimized; only the explicit poll and bare interrupt escape hatches should remain.

## Phase Status

- Phase 0: completed
- Identified that the current fail-closed follow-up split is not the desired product behavior.
- Phase 1: completed
- Reworked runtime sequencing around restart-on-change semantics.
- Phase 2: completed
- Refreshed tests, docs, and final verification.

## Locked Decisions

- Do not revert to the obsolete async sandbox update protocol.
- Do not let explicit non-`inherit` CLI sandbox modes depend on Codex metadata.
- If the inherited sandbox changes, the next non-poll, non-bare-interrupt interaction should reset the worker instead of trying to preserve the old request/session.

## Open Questions

- What exact restart notice text best communicates both the restart cause and the new effective sandbox policy without creating brittle snapshots?
- Should a bare `Ctrl-C` with no live worker remain a no-op control reply or continue to surface the existing idle/session behavior?

## Decision Log

- 2026-04-19: Replaced the earlier fail-closed control-tail contract with a restart-on-change contract for non-poll, non-bare-interrupt interactions.
- 2026-04-19: Landed the runtime, docs, and regression updates; restart notices are informational and initial inherit-mode spawns do not pretend they were restarts.
47 changes: 39 additions & 8 deletions docs/sandbox.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,14 +21,45 @@ metadata channel in that mode, `mcp-repl --debug-repl --sandbox inherit`
bootstraps one local inherited snapshot from the current default sandbox state
before the first worker spawn.

For `repl`, empty-input polls ignore per-call sandbox metadata when they can be
answered from existing state, such as draining a timed-out request or returning an
idle prompt from an already-running worker. If an empty-input call must spawn a
worker to answer the call, `mcp-repl` applies the current tool call's sandbox
metadata before that spawn. Non-empty `repl` calls resolve any stale timeout
marker first, then apply the current call's sandbox metadata before executing
fresh code. If a timed-out request is still genuinely in flight, follow-up calls
continue servicing that request instead of switching sandboxes mid-flight.
For `repl`, inherited sandbox metadata controls the worker session that handles
the call. When a non-empty tool call would use the worker and the effective
inherited sandbox changed, `mcp-repl` restarts the worker before serving that
call and includes a restart notice that names the new sandbox policy.

More specifically:

- Empty-input polls ignore per-call sandbox metadata while they are only
draining existing pending or settled output, or returning an idle prompt from
an already-running worker.
- If an empty-input poll needs to spawn or respawn a worker to finish answering
the call, `mcp-repl` applies the current tool call's metadata before that
spawn. If a poll can first answer by draining a session-ended request, it
returns that local drain without respawning; the next spawn-needed call must
provide valid current metadata.
- While the pager is active, pure pager navigation is local UI state, not a
worker interaction. Pager-local commands such as `:q` or empty-string page
advance ignore sandbox metadata until a later tool call actually interacts
with the worker again. Bare `Ctrl-D` is not pager navigation; it remains an
explicit restart even when the pager is active.
- Bare `Ctrl-C` is the one non-empty `repl` follow-up that stays local and does
not force a sandbox-driven restart.
- Every other non-empty `repl` call must have valid current
`_meta["codex/sandbox-state-meta"]`.
- A non-empty retry after the memory guardrail aborts a worker is an ordinary
non-empty call. It must have valid current metadata before `mcp-repl` resets
or retries under `--sandbox inherit`.
- Non-empty `repl` calls resolve stale timeout markers before deciding whether
they are still looking at a live worker request.
- If current metadata changes the effective inherited sandbox, `mcp-repl`
restarts the worker at that call before handling the input.
- Control-prefixed tails such as `Ctrl-C<code>` and `Ctrl-D<code>` run in the
restarted session when the sandbox changed; the control prefix itself is not
replayed into the fresh worker.
- Explicit restarts discard preserved detached output from aborted prior
requests instead of carrying it into later unrelated replies.
- Sandbox metadata is enforced again at the next tool call that actually
interacts with the worker after pager navigation ends.
- Missing or malformed metadata still fails closed on calls that need it.

The worker also gets a per-session temp directory, exported as:

Expand Down
2 changes: 1 addition & 1 deletion src/debug_repl.rs
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ pub(crate) fn run(

while let Some(line) = read_line(&mut stdin)? {
if is_exact_command(&line, "INTERRUPT") {
let reply = worker.interrupt(DEFAULT_WRITE_STDIN_TIMEOUT);
let reply = worker.interrupt(DEFAULT_WRITE_STDIN_TIMEOUT, None, false);
render_visible_reply(
response.as_mut(),
reply,
Expand Down
7 changes: 7 additions & 0 deletions src/sandbox_cli.rs
Original file line number Diff line number Diff line change
Expand Up @@ -197,6 +197,13 @@ pub fn resolve_effective_sandbox_state(
resolve_effective_sandbox_state_with_defaults(plan, inherited, &defaults)
}

pub fn validate_sandbox_plan_with_defaults(
plan: &SandboxCliPlan,
defaults: &SandboxState,
) -> Result<(), String> {
validate_sandbox_plan_operations(plan, None, defaults)
}

pub fn resolve_effective_sandbox_state_with_defaults(
plan: &SandboxCliPlan,
inherited: Option<&SandboxState>,
Expand Down
Loading
Loading