Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -177,7 +177,7 @@ mcp-repl install --client codex --interpreter r
Bare `mcp-repl` defaults to `--oversized-output pager`.

`install --client codex` writes `--sandbox inherit --oversized-output files` by default. That
sentinel means `mcp-repl` should inherit sandbox policy updates from Codex for the session while
sentinel means `mcp-repl` should inherit sandbox policy metadata from Codex on each tool call while
keeping installed Codex configs on the file-backed oversized-output path.

Example `R` REPL Codex config (paths vary by OS/user):
Expand All @@ -187,8 +187,8 @@ Example `R` REPL Codex config (paths vary by OS/user):
command = "/Users/alice/.cargo/bin/mcp-repl"
# mcp-repl handles the primary timeout; this higher Codex timeout is only an outer guard.
tool_timeout_sec = 1800
# --sandbox inherit: use sandbox policy updates sent by Codex for this session.
# If no update is sent, mcp-repl exits with an error.
# --sandbox inherit: use sandbox policy metadata sent by Codex on each tool call.
# mcp-repl fails closed if the tool call omits or malforms that metadata.
args = [
"--sandbox", "inherit",
"--oversized-output", "files",
Expand All @@ -203,8 +203,8 @@ Example `Python` REPL Codex config:
command = "/Users/alice/.cargo/bin/mcp-repl"
# mcp-repl handles the primary timeout; this higher Codex timeout is only an outer guard.
tool_timeout_sec = 1800
# --sandbox inherit: use sandbox policy updates sent by Codex for this session.
# If no update is sent, mcp-repl exits with an error.
# --sandbox inherit: use sandbox policy metadata sent by Codex on each tool call.
# mcp-repl fails closed if the tool call omits or malforms that metadata.
args = [
"--sandbox", "inherit",
"--oversized-output", "files",
Expand All @@ -213,7 +213,7 @@ args = [
```

For Claude, `install --client claude` writes to `~/.claude.json` with explicit sandbox mode and
`--oversized-output files` because Claude does not propagate sandbox state updates to MCP servers:
`--oversized-output files` because Claude does not propagate Codex-style sandbox metadata to MCP servers:

```json
// ~/.claude.json
Expand Down
2 changes: 1 addition & 1 deletion docs/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ The repository is organized around a few concrete subsystems rather than deep pa

### Sandbox and process isolation

- `src/sandbox.rs`, `src/sandbox_cli.rs`, and `src/windows_sandbox.rs` implement OS-level sandboxing, writable-root policy, and client-driven sandbox updates.
- `src/sandbox.rs`, `src/sandbox_cli.rs`, and `src/windows_sandbox.rs` implement OS-level sandboxing, writable-root policy, and Codex per-tool-call sandbox metadata handling.
- The sideband and sandbox contracts are documented in `docs/sandbox.md` and `docs/worker_sideband_protocol.md`.

### Output, images, and debug surfaces
Expand Down
11 changes: 8 additions & 3 deletions docs/debugging.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,10 @@ Enable per-startup JSONL logs with either:

Each startup creates a fresh session directory under that root. `mcp-repl` writes:

- `events.jsonl` with startup metadata, tool calls, and sandbox custom request events
- `events.jsonl` with startup metadata, tool calls, and parsed sandbox metadata events
- `startup.log` for server-side startup trace lines
- `worker-startup.log` for worker-side startup trace lines
- `sandbox-state.jsonl` for the initial effective sandbox policy plus later sandbox policy/update payloads
- `sandbox-state.jsonl` for the initial effective sandbox policy plus later tool-call sandbox metadata and effective policy updates

Example:

Expand All @@ -43,7 +43,7 @@ MCP_REPL_DEBUG_DIR=/tmp/mcp-repl-debug mcp-repl --interpreter python

## MCP and sandbox tracing

These switches are useful when the client is sending custom sandbox updates or when the sandbox policy is the thing you are debugging.
These switches are useful when the client is sending Codex sandbox metadata or when the sandbox policy is the thing you are debugging.

- `MCP_REPL_DEBUG_DIR=/path/to/debug-root` writes `sandbox-state.jsonl` inside the session directory
- `MCP_REPL_KEEP_SESSION_TMPDIR=1` keeps the worker session temp directory after exit so you can inspect it
Expand All @@ -59,6 +59,11 @@ MCP_REPL_DEBUG_DIR=/tmp/mcp-repl-debug mcp-repl --sandbox inherit

`--debug-repl` runs `mcp-repl` as a local interactive driver for the worker instead of as an MCP server. This is the fastest way to reproduce REPL behavior without involving a client.

If you start it with `--sandbox inherit`, the debug REPL bootstraps one local
inherited sandbox snapshot from the current default sandbox state before the
first worker spawn. That keeps the inherit code path debuggable even though
there is no per-tool-call MCP metadata in local debug mode.

Start it with:

```sh
Expand Down
3 changes: 2 additions & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,10 @@ checked-in execution plans without relying on stale notes.
- `docs/output_timeline.md`: server-side model for merging text pipes and sideband events into visible reply order.
- `docs/testing.md`: public validation surface and snapshot workflow.
- `docs/debugging.md`: debug logs, `--debug-repl`, and wire tracing.
- `docs/sandbox.md`: sandbox modes, writable roots, and client-driven sandbox updates.
- `docs/sandbox.md`: sandbox modes, writable roots, and Codex per-tool-call sandbox metadata.
- `docs/worker_sideband_protocol.md`: server/worker IPC contract.
- `docs/plans/AGENTS.md`: when to write a checked-in execution plan and where it lives.
- `docs/plans/completed/codex-sandbox-state-meta-migration.md`: completed plan for migrating Codex `--sandbox inherit` from async updates to per-tool-call sandbox metadata.

## Normative Docs

Expand Down
99 changes: 99 additions & 0 deletions docs/plans/completed/codex-sandbox-state-meta-migration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
# Codex Sandbox State Meta Migration

## Summary

- Migrate `mcp-repl`'s Codex `--sandbox inherit` integration from the obsolete async sandbox update protocol to Codex's current per-tool-call `_meta["codex/sandbox-state-meta"]` contract.
- Keep `--sandbox inherit` fail-closed: if current Codex does not provide usable sandbox metadata on a tool call, `mcp-repl` must reject the call instead of falling back to a local default.
- Keep explicit sandbox modes such as `--sandbox read-only` and `--sandbox workspace-write` authoritative; Codex metadata must not override them.
- Do not carry backward compatibility for older Codex releases that still depended on the old update channel.

## Status

- State: completed
- Last updated: 2026-04-18
- Current phase: completed

## Design Intent

- When `mcp-repl` is configured with `--sandbox inherit`, Codex is the source of truth for the sandbox state.
- `mcp-repl` must not guess, substitute, or silently fall back to its own default sandbox when it is expecting Codex to provide one.
- Security is the main constraint: if Codex intended `read-only`, `mcp-repl` must not run with broader permissions just because sandbox information was late, missing, or malformed.
- MCP startup should stay fast. `mcp-repl` should not block initialization waiting for sandbox information that belongs to a later tool call.
- Waiting, if any, belongs only at the point where a tool call needs sandbox information in order to run safely.
- The repo should track the current public Codex contract and exercise the real Codex binary in integration coverage so protocol drift is surfaced quickly.

## Motivation

- The old design assumed Codex would push sandbox state out of band, asynchronously, at session startup.
- That assumption is brittle because it separates sandbox selection from the tool call that actually needs the sandbox.
- When that assumption fails, the failure mode is not just a functional bug. It is a security issue, because the effective sandbox can become broader than what Codex intended.
- The correct shape is request-scoped: if `mcp-repl` is inheriting sandbox policy from Codex, it should only run once it has the sandbox information that applies to that call.
- The simplest safe path is to target current Codex directly instead of layering compatibility logic around an obsolete protocol.

## Current Direction

- Treat the current release of Codex as the only target contract for this slice.
- Replace the server-side sandbox update listener with per-tool-call parsing of `_meta["codex/sandbox-state-meta"]`.
- Advertise only the current Codex experimental capability needed for that metadata path.
- Rebuild the public tests around the new contract before changing runtime code.

## Long-Term Direction

- The long-term contract should be simple: `mcp-repl` determines the inherited sandbox directly from the Codex tool call that is about to execute.
- Startup should stay fast. `mcp-repl` should not block MCP initialization waiting for sandbox state that belongs to a later tool call.
- Any server state retained between calls should be minimal bookkeeping, not a second sandbox synchronization protocol.

## Phase Status

- Phase 0: completed
- Audited the current `inherit` path, documented the protocol shift, and locked the bounded design.
- Phase 1: completed
- Added failing public regressions for metadata-driven sandbox inheritance and fail-closed behavior.
- Phase 2: completed
- Implemented the runtime migration and removed obsolete update-handling code.
- Phase 3: completed
- Refreshed real-Codex integration coverage, docs, and final verification.

## Locked Decisions

- Do not implement compatibility shims for older Codex sandbox update behavior.
- Do not infer `read-only` versus `workspace-write` from coarse metadata such as `x-codex-turn-metadata.sandbox`; that signal is not precise enough.
- Do not fall back from missing Codex sandbox metadata to `mcp-repl`'s local default policy.
- Do not let Codex metadata override explicit non-`inherit` CLI sandbox modes.
- Prefer a single happy path: current Codex should supply `codex/sandbox-state-meta` on each tool call, and `mcp-repl` should consume that directly.

## Outcome

- `mcp-repl` now advertises `codex/sandbox-state-meta` only when `--sandbox inherit` is configured.
- Later explicit sandbox mode overrides such as `--sandbox inherit --sandbox workspace-write` or `sandbox_mode=workspace-write` cancel inherit-mode metadata requirements, matching the documented later-wins CLI semantics.
- Ordered sandbox plans still validate earlier operations before later mode resets; later-wins resolution does not silently discard earlier invalid CLI/config ops.
- `--debug-repl --sandbox inherit` remains locally usable by bootstrapping one inherited snapshot from the current default sandbox state before the first worker spawn.
- `repl_reset` derives inherited sandbox state from the current tool call's `_meta["codex/sandbox-state-meta"]`.
- Non-empty `repl` calls derive inherited sandbox state from the current tool call's `_meta["codex/sandbox-state-meta"]` before executing fresh code.
- Empty-input `repl` polls ignore per-call sandbox metadata when they can be answered from existing state, but they still apply the current tool call's metadata before spawning a worker to answer an idle call on a fresh session.
- When a prior timed-out request has already settled, `mcp-repl` resolves the stale timeout marker before deciding whether a new non-empty `repl` call is still just a busy follow-up.
- Missing or malformed metadata fails closed with the existing inherit error path.
- Explicit non-`inherit` sandbox modes ignore Codex metadata.
- The old async sandbox update listener and startup settle logic were removed from the active runtime contract.

## Verification

- `cargo check`
- `cargo build`
- `cargo clippy --all-targets --all-features -- -D warnings`
- `cargo test`
- `cargo +nightly fmt --all`

## Notes

- Current Codex source and live traces both showed the old async update protocol was obsolete for the current release line.
- The migration stayed intentionally single-path: no compatibility layer for older Codex builds.
- Follow-up review fixes tightened the runtime sequencing so sandbox metadata is applied only for fresh execution or worker spawn, not for empty-input polls that are only draining prior output or using an already-running idle session.

## Decision Log

- 2026-04-17: Scoped the work to current-release Codex only. Older async sandbox update behavior is out of scope.
- 2026-04-17: Locked `--sandbox inherit` to remain fail-closed. Missing or malformed Codex sandbox metadata must reject the tool call.
- 2026-04-17: Chose per-tool-call `_meta["codex/sandbox-state-meta"]` as the source of truth after inspecting current Codex source and live traces.
- 2026-04-17: Completed the repo migration and verification against the real current Codex integration tests.
- 2026-04-18: Clarified the shipped contract for `repl`: empty-input polls ignore per-call sandbox metadata only when they can be answered from existing state, while fresh non-empty calls resolve stale timeout markers and then apply the current call's sandbox metadata before executing new code.
30 changes: 23 additions & 7 deletions docs/sandbox.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,25 @@ When no CLI sandbox mode is provided, the default is:
- `workspace-write`
- `network_access: false`

When `--sandbox inherit` is used, the client must send a sandbox update
(`codex/sandbox-state/update`) before the first worker start/tool call.
If no update is provided, the first tool call fails fast.
When `--sandbox inherit` is used for MCP server operation, the client must
attach per-tool-call sandbox metadata in `_meta["codex/sandbox-state-meta"]`.
That metadata is the source of truth for the tool call that is about to run. If
it is missing or malformed, `mcp-repl` fails closed with `--sandbox inherit
requested but no client sandbox state was provided`.

`--debug-repl` is the one local-only exception. Because there is no client
metadata channel in that mode, `mcp-repl --debug-repl --sandbox inherit`
bootstraps one local inherited snapshot from the current default sandbox state
before the first worker spawn.

For `repl`, empty-input polls ignore per-call sandbox metadata when they can be
answered from existing state, such as draining a timed-out request or returning an
idle prompt from an already-running worker. If an empty-input call must spawn a
worker to answer the call, `mcp-repl` applies the current tool call's sandbox
metadata before that spawn. Non-empty `repl` calls resolve any stale timeout
marker first, then apply the current call's sandbox metadata before executing
fresh code. If a timed-out request is still genuinely in flight, follow-up calls
continue servicing that request instead of switching sandboxes mid-flight.

The worker also gets a per-session temp directory, exported as:

Expand All @@ -28,8 +44,8 @@ The worker also gets a per-session temp directory, exported as:
`mcp-repl --add-allowed-domain <pattern>`
- Advanced overrides:
`mcp-repl --config key=value` with Codex-shaped keys
- MCP sandbox update method:
`codex/sandbox-state/update` (capability `codex/sandbox-state`)
- MCP sandbox metadata capability:
`codex/sandbox-state-meta` (advertised only when the effective CLI sandbox mode still resolves to `inherit` after later overrides)

Operations are applied strictly in CLI argument order. Later operations win.
`--sandbox ...` resets the base policy at the point where it appears.
Expand Down Expand Up @@ -72,8 +88,8 @@ Sandboxing is enforced by a Linux sandbox helper that applies seccomp + Landlock
- default Linux worker setup disables network unless explicitly enabled.
- `mcp-repl` always uses its own internal Linux sandbox launcher; client-provided
helper executable paths are ignored.
- inherited `useLegacyLandlock` updates are translated onto `mcp-repl`'s
internal `bwrap` on/off choice.
- Codex sandbox metadata does not control `mcp-repl`'s optional internal
`bwrap` stage. That remains a local best-effort setting.

Optional `bwrap` stage:

Expand Down
2 changes: 1 addition & 1 deletion docs/testing.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ This file is the entrypoint for deciding how to verify a change.
- `tests/repl_surface.rs` and `tests/python_backend.rs`: IPC ownership coverage. Only the main worker may own sideband fds; user-spawned children must not. `tests/python_backend.rs` also covers detached-idle oversized-output behavior through the public `repl` API.
- `tests/server_smoke.rs`: end-to-end MCP session smoke coverage.
- `tests/write_stdin_behavior.rs`: timeout polling, oversized text replies, and transcript-file behavior through the public `repl` API.
- `tests/sandbox.rs` and `tests/sandbox_state_updates.rs`: sandbox policy behavior and client-driven updates.
- `tests/sandbox.rs` and `tests/sandbox_state_updates.rs`: sandbox policy behavior and Codex per-tool-call sandbox metadata.
- `tests/plot_images.rs` and `tests/python_plot_images.rs`: plot/image behavior through the public tool surface.
- `tests/codex_approvals_tui.rs` and `tests/claude_integration.rs`: client integration coverage.
- `tests/docs_contracts.rs`: docs map and snapshot-facing documentation contracts.
Expand Down
15 changes: 6 additions & 9 deletions src/debug_repl.rs
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ use crate::sandbox_cli::SandboxCliPlan;
use crate::server::response::{
ResponseState, TimeoutBundleReuse, text_stream_from_content, timeout_bundle_reuse_for_input,
};
use crate::worker_process::{WorkerError, WorkerManager};
use crate::worker_process::{WorkerError, WorkerManager, WriteStdinOptions};
use crate::worker_protocol::{TextStream, WorkerContent, WorkerErrorCode, WorkerReply};

const DEFAULT_WRITE_STDIN_TIMEOUT: Duration = Duration::from_secs(60);
Expand Down Expand Up @@ -49,6 +49,7 @@ pub(crate) fn run(
} else {
None
};
worker.bootstrap_local_inherited_sandbox_state()?;
worker.warm_start()?;
let reply = wait_for_initial_prompt(&mut worker, server_timeout)?;
render_visible_reply(
Expand Down Expand Up @@ -105,8 +106,7 @@ pub(crate) fn run(
String::new(),
DEFAULT_WRITE_STDIN_TIMEOUT,
server_timeout,
None,
false,
WriteStdinOptions::default(),
);
render_visible_reply(
response.as_mut(),
Expand Down Expand Up @@ -143,8 +143,7 @@ pub(crate) fn run(
input,
DEFAULT_WRITE_STDIN_TIMEOUT,
server_timeout,
None,
false,
WriteStdinOptions::default(),
);
render_visible_reply(
response.as_mut(),
Expand Down Expand Up @@ -176,17 +175,15 @@ fn wait_for_initial_prompt(
String::new(),
DEFAULT_WRITE_STDIN_TIMEOUT,
server_timeout,
None,
false,
WriteStdinOptions::default(),
)?;
while !reply_has_prompt(&last_reply) && Instant::now() < deadline {
thread::sleep(INITIAL_PROMPT_POLL_INTERVAL);
last_reply = worker.write_stdin(
String::new(),
DEFAULT_WRITE_STDIN_TIMEOUT,
server_timeout,
None,
false,
WriteStdinOptions::default(),
)?;
}
Ok(last_reply)
Expand Down
10 changes: 7 additions & 3 deletions src/install.rs
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ use toml_edit::{Array, DocumentMut, Item, Table, value};
const CODEX_TOOL_TIMEOUT_SECS: i64 = 1_800;
const CODEX_TOOL_TIMEOUT_COMMENT: &str =
"\n# mcp-repl handles the primary timeout; this higher Codex timeout is only an outer guard.\n";
const CODEX_SANDBOX_INHERIT_COMMENT: &str = "\n# --sandbox inherit: use sandbox policy updates sent by Codex for this session.\n# If no update is sent, mcp-repl exits with an error.\n";
const CODEX_SANDBOX_INHERIT_COMMENT: &str = "\n# --sandbox inherit: use sandbox policy metadata sent by Codex on each tool call.\n# mcp-repl fails closed if the tool call omits or malforms that metadata.\n";
pub const DEFAULT_R_SERVER_NAME: &str = "r";
pub const DEFAULT_PYTHON_SERVER_NAME: &str = "python";

Expand Down Expand Up @@ -852,7 +852,9 @@ name="demo"

let text = fs::read_to_string(config).expect("read config");
assert!(
text.contains("--sandbox inherit: use sandbox policy updates sent by Codex"),
text.contains(
"--sandbox inherit: use sandbox policy metadata sent by Codex on each tool call"
),
"expected inherit comment in codex config"
);
}
Expand All @@ -878,7 +880,9 @@ name="demo"

let text = fs::read_to_string(config).expect("read config");
assert!(
!text.contains("--sandbox inherit: use sandbox policy updates sent by Codex"),
!text.contains(
"--sandbox inherit: use sandbox policy metadata sent by Codex on each tool call"
),
"inherit-only comment should be removed when inherit is no longer configured"
);
}
Expand Down
Loading
Loading