[pull] main from openai:main by pull[bot] · Pull Request #58 · kontext-security/codex

pull · 2026-03-12T00:25:27Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

## Summary - send `X-OpenAI-Internal-Codex-Responses-Lite: true` on HTTP Responses requests and WebSocket upgrade requests when model metadata enables Responses Lite - use client metadata when sending it over the websocket This PR is stacked on #26490. ## Why The Responses Lite marker is request-scoped for HTTP but connection-scoped for Responses-over-WebSocket because it is carried on the upgrade request. Reusing a cached socket opened for the opposite mode would therefore send the wrong transport contract. ## Validation - `just test -p codex-core responses_lite` - `just test -p codex-core responses_websocket_reconnects_when_responses_lite_mode_changes` - `just fix -p codex-core` - `just fmt`

## Why Permission profile allowlists are an enterprise security boundary, but they also need to compose across the managed requirements layers added in #24620. A map representation lets each requirements layer add, allow, or revoke individual profiles without replacing an entire array. ## Managed Contract Administrators configure the mergeable allow map with `allowed_permission_profiles`. A recommended enterprise configuration explicitly lists every built-in and custom profile users should be able to select: ```toml default_permissions = "review_only" [allowed_permission_profiles] ":read-only" = true ":workspace" = true review_only = true # ":danger-full-access" is intentionally omitted, so it is denied. [permissions.review_only] extends = ":read-only" ``` - Profiles whose effective merged value is `true` are allowed. - Missing profiles and profiles set to `false` are denied. - This is a closed allowlist: built-in profiles and profiles introduced in future versions are denied unless explicitly allowed. - Explicitly list each built-in profile the enterprise wants to make available. Omit built-ins such as `:danger-full-access` when they should remain unavailable. - Set `default_permissions` explicitly to the allowed profile users should receive when they have no local selection. - Higher-precedence layers override only the profile keys they define. - `false` is only needed when a higher-precedence layer must revoke a `true` inherited from a lower layer. - Explicit keys must refer to known built-in or managed profiles. A custom or narrowed allowlist requires an allowed `default_permissions`. For compatibility, if both `:workspace` and `:read-only` are explicitly allowed, an omitted default resolves to `:workspace`; customer configurations should still set the intended default explicitly. When `allowed_permission_profiles` is absent, existing implicit permission and legacy `sandbox_mode` behavior is unchanged. ## What Changed - Add `allowed_permission_profiles` as a `BTreeMap<String, bool>` that merges per profile across requirements layers. - Enforce managed defaults, strict denial of omitted profiles, and the explicitly allowed standard-pair fallback. - Expose `allowedPermissionProfiles` through `configRequirements/read` and regenerate its schemas. - Add regression coverage for map composition and revocation, managed defaults, strict denial of omitted built-ins, and API output. ## Verification - Focused `codex-config` coverage for layered map composition and revocation - Focused `codex-core` coverage for managed defaults, invalid defaults, strict denial of omitted built-ins, and the standard built-in pair - Focused `codex-app-server` coverage for requirements API output - Scoped Clippy for `codex-config`, `codex-core`, `codex-app-server-protocol`, and `codex-app-server` ## Documentation The managed `requirements.toml` documentation should introduce `allowed_permission_profiles` as a closed permission-profile allowlist before this setting is published on developers.openai.com. --------- Co-authored-by: Codex <noreply@openai.com>

Skill reloads can get noisy when the watcher keeps triggering `skills/list` and the same invalid `SKILL.md` error comes back each time. This keeps the first warning visible, then suppresses repeats while the same `(path, message)` is still active. If the error clears and later comes back, or if the message changes, it will show again. Validation: - `just fmt` - `just test -p codex-tui skill_load_warning_state`

## Why `just test` should run the test suite without also compiling and executing benchmark smoke tests. Keeping benchmark validation explicit avoids adding unrelated work to every project-specific test invocation. ## What changed - Remove the `just bench-smoke` step from the Unix and Windows `test` recipes. - Document `just bench` and `just bench-smoke` as the explicit benchmark commands in `AGENTS.md`. ## Validation - `just test -p codex-arg0` - `just --dry-run test` - `just --dry-run bench-smoke`

…26741) ## Why A remote-control WebSocket handshake can receive a generic HTTP 404 when an intermediary routes the request without preserving the WebSocket upgrade. Treating every 404 as proof that the remote app server is gone clears valid enrollment and causes repeated re-enrollment, new environment and server IDs, Habitat churn, and noisy `/server/enroll` traffic. ## What Changed - Clear enrollment only when a 404 JSON response explicitly contains `{"detail":"Remote app server not found"}`. - Preserve enrollment for empty, plain-text, malformed, or otherwise unrecognized 404 responses, return the transport error, and retry with the existing reconnect backoff. - Log the status, correlation headers (`request-id` or `x-oai-request-id`, plus `cf-ray`), and bounded/redacted response body for unrecognized 404s. - Cover both explicit missing-server re-enrollment and generic 404 enrollment preservation/reconnect behavior. ## Verification `just test -p codex-app-server-transport` passes all 114 tests on the rebased branch, including the targeted explicit and generic WebSocket 404 scenarios. Related issue: N/A

## Why `BUILDBUDDY_API_KEY` now lives in the `bazel` GitHub Actions environment as an environment secret. Jobs that need BuildBuddy credentials must opt into that environment so `${{ secrets.BUILDBUDDY_API_KEY }}` resolves from the protected environment secret instead of relying on an unscoped repository/organization secret. This follows the same environment-secret migration pattern as #26466. ## What Changed - Attach each workflow job that reads `BUILDBUDDY_API_KEY` to the `bazel` environment. - Set `deployment: false` on those job-level environment blocks. `deployment: false` lets the job enter the `bazel` environment to access its environment secrets without creating GitHub deployment records for these CI jobs. That keeps the environment as a secret/access-control boundary without making ordinary Bazel CI runs look like deploys. ## Validation - Parsed the modified workflow YAML files with Ruby's YAML parser. - Checked the modified workflow files for trailing whitespace.

## Why This PR fixes approval sandbox semantics in the unified-exec path. The zsh-fork runtime exposed the bug because the shell can do meaningful work before any intercepted child `execv(2)` exists: redirections, builtins, globbing, and pipeline setup all happen in the launch process. If the model requested `sandbox_permissions=require_escalated`, or an exec-policy `allow` rule explicitly bypassed the sandbox, that approved sandbox decision needs to be preserved for the launch path and for intercepted execs that use the same approval machinery. The behavior is not only about zsh fork. The production changes are in shared approval/escalation code, so they also affect non-zsh-fork intercepted exec paths that go through the same sandbox decision logic. The narrow intent is to preserve the approval decision while still keeping denied-read profiles and bounded additional-permission requests sandboxed. ## Production Changes - `codex-rs/core/src/tools/runtimes/unified_exec.rs`: derives a `launch_sandbox_permissions` value from the requested sandbox permissions and the runtime filesystem policy, then uses that value for managed-network/env setup and launch sandbox selection. This keeps full approval or policy-bypass decisions visible to the first unified-exec attempt, while still preventing a full sandbox override from discarding denied-read restrictions. Direct unified exec keeps the same decision surface; the important difference is that zsh-fork launch setup no longer accidentally loses the approved parent sandbox decision. - `codex-rs/core/src/tools/runtimes/shell/unix_escalation.rs`: makes intercepted-exec escalation selection explicit for the three sandbox permission modes. `UseDefault` only escalates when an exec-policy decision allows sandbox bypass, `RequireEscalated` escalates when unsandboxed execution is allowed, and `WithAdditionalPermissions` escalates through the bounded additional-permissions path instead of being treated as a full unsandboxed override. Unsandboxed intercepted execs now also rebuild the environment as `RequireEscalated`, which strips managed-network proxy variables consistently with other unsandboxed execution. ## Test Coverage Most of the PR is tests. The new coverage verifies: - unified exec preserves parent approval and exec-policy sandbox decisions for zsh-fork launch selection; - bounded `with_additional_permissions` remains sandboxed and permission-profile based; - denied-read profiles are not weakened by parent approval; - explicit prompt rules still prompt for intercepted execs after the parent command is approved; - unsandboxed intercepted execs strip managed-network env vars. No documentation update is needed; this is an internal approval/sandbox correctness fix. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/24981). * #24982 * __->__ #24981

## Why Interactive `codex resume` and `codex fork` expose both a session ID positional and an initial prompt positional. With `--last`, Clap still assigns the first positional to the session ID, so a command such as `codex fork --last "/compact focus on auth"` either fails parsing or attempts to look up the prompt as a session ID instead of sending it to the latest session. This makes it impossible to select the latest session and immediately provide a follow-up prompt, even though `codex exec resume --last` already supports that workflow. <img width="1746" height="1024" alt="CleanShot 2026-06-06 at 17 00 47@2x" src="https://github.com/user-attachments/assets/86885c07-a23c-48ee-b0ee-47f2484f6eb7" /> ## What Changed - Reinterpret the first positional as the initial prompt when interactive `resume --last` or `fork --last` is used and no explicit second prompt was parsed. - Preserve the existing `resume SESSION_ID PROMPT` and `fork SESSION_ID PROMPT` behavior. - Add parser-level regression coverage for latest-session and explicit-session prompt forms. ## How to Test 1. Start an interactive session, exit it, then run `codex resume --last "continue from the latest session"`. 2. Confirm Codex resumes the latest session and submits the supplied prompt instead of treating it as a session ID. 3. Run `codex fork --last "take a different approach"`. 4. Confirm Codex forks the latest session and submits the supplied prompt. 5. Also verify `codex resume SESSION_ID "continue here"` and `codex fork SESSION_ID "branch here"` still target the explicit session and submit the prompt. Targeted tests: - `just test -p codex-cli` (267 passed)

## Why MCP startup failures from spawned subagents were rendered as global notifications, so a child thread's failure could pollute the visible parent transcript. Routing the notification to the child exposed two related replay problems: session refresh could discard the buffered event, and a newly created child `ChatWidget` did not know the expected MCP server set, which could leave its startup spinner running after every server had settled. MCP startup diagnostics should remain visible in the thread that owns the startup without affecting other transcripts. The protocol also needs to support a future app-scoped MCP lifecycle where startup is not owned by any thread. ## Reported Behavior The [originating Slack report](https://openai.slack.com/archives/C08JZTV654K/p1780604538859939) called out that using subagents could turn MCP startup failures into a wall of yellow CLI warnings because repeated failures were not deduplicated. The intended behavior is for those diagnostics to remain visible once in the thread that owns the startup, without polluting the parent transcript. ## What Changed - add nullable `threadId` ownership to `mcpServer/startupStatus/updated` - populate it from the app-server conversation ID for the current thread-scoped lifecycle and regenerate the protocol schema and TypeScript artifacts - treat a missing or null `threadId` as app-scoped without injecting it into the active chat transcript - route and buffer thread-owned MCP startup notifications by thread in the TUI - preserve buffered MCP startup events across child session refresh - seed expected MCP servers before replaying a thread snapshot so startup reaches its terminal state - suppress an identical repeated failure warning for the same server within one startup round The owning thread still renders the detailed failure and final `MCP startup incomplete (...)` summary. ## How to Test 1. Configure an optional MCP server named `smoke` that exits during initialization. 2. Launch the TUI with multi-agent support enabled. 3. Confirm the main thread's own startup failure renders one detailed `smoke` warning and one incomplete-startup summary. 4. Spawn exactly one subagent. 5. Confirm the parent transcript does not receive the subagent's MCP startup failure. 6. Switch to the subagent thread and confirm it contains exactly one detailed `smoke` failure and one incomplete-startup summary. 7. Confirm the subagent's MCP startup spinner disappears and the thread remains usable. 8. Switch between the parent and subagent and confirm the warnings neither move nor duplicate. Targeted tests: - `just test -p codex-app-server-protocol` - `just test -p codex-app-server thread_start_emits_mcp_server_status_updated_notifications` - `just test -p codex-tui mcp_startup` The parent/child behavior and spinner completion were also exercised manually in tmux. `just argument-comment-lint` was attempted but blocked by an unrelated local Bazel LLVM empty-glob failure; touched Rust callsites were inspected manually.

## What - Consume plaintext `output` from standalone search while retaining optional `encrypted_output` parsing. - Expose `web.run` to code mode and return search output to nested JavaScript calls. - Cover direct and code-mode standalone search paths with integration tests. ## Why `/v1/alpha/search` now returns plaintext output, which code mode needs to consume standalone search results. ## Test plan - `just test -p codex-api` - `just test -p codex-web-search-extension` - `just test -p codex-core code_mode_can_call_standalone_web_search` - `just test -p codex-app-server standalone_web_search_round_trips_output`

## Why Multi-agent v2 treats agents as durable logical agents, not just live entries in `ThreadManager`. After the reload-on-delivery change, a v2 agent can be addressed even if its thread is not currently loaded. This PR adds the next layer: loaded v2 subagents can be paged out of `ThreadManager` when the session has too many resident agents. That keeps residency separate from logical identity and prepares the stack for making v2 concurrency count active execution instead of existing agents. ## What Changed - Add an `AgentControl`-scoped LRU for resident v2 subagents. - Reserve residency before spawning or reloading a v2 subagent. - If resident capacity is full, unload the least-recently-used idle v2 subagent from `ThreadManager`. - Keep `ThreadManager` as a primitive loaded-thread store; it does not own the LRU policy. - Keep unloaded agents registered and durable so they can be reloaded by the delivery path. - Preserve the existing v2 cap semantics by using the derived non-root v2 cap for residency. Eviction is intentionally conservative. A thread is unloadable only when it is a v2 subagent, has completed or errored, has no active turn, and has no pending mailbox work. Before removal, the rollout is materialized and flushed. ## Assumptions And Non-Goals - PR #26623 provides the reload-on-delivery path for unloaded v2 agents. - `ThreadManager` membership means loaded/resident, not logical agent existence. - `AgentRegistry` remains the logical identity/metadata source for v2 agents that may be unloaded. - `list_agents` remains a recent/resident view for now. - This does not change active execution concurrency; that is the next PR. - This does not change `close_agent` semantics. - This does not change or remove `resume_agent`. - This does not add a new residency config knob. ## Stack 1. V2 durable lookup and reload on delivery (#26623) - reload unloaded v2 agents before delivering follow-up/input. 2. V2 residency LRU (this PR) - unload idle resident v2 agents from `ThreadManager` when resident capacity is full. 3. V2 active-execution concurrency - count running non-root v2 turns instead of logical agents. 4. V2 close/interrupt semantics - make v2 close interrupt the current turn without deleting durable identity. 5. V2 resume cleanup - remove the manual resume surface for v2 while keeping internal reload support. ## Validation - Added focused coverage for the residency LRU eviction path. - Local clippy/check/tests were not run; CI will cover them.

## Summary - ignore RUSTSEC-2026-0173 in cargo-deny and cargo-audit config - document that proc-macro-error2 is pulled in transitively via i18n-embed-fl/age/codex-secrets - leave the ignore temporary until codex-secrets moves off age or age drops i18n-embed-fl ## Validation - just fmt - cargo deny check --hide-inclusion-graph

## Why Multi-Agent V2 concurrency should count active non-root turns, not resident or durable agent threads. The limit is intentionally best effort: admission checks are synchronous, but concurrent successful checks may overshoot slightly. ## What changed - Keep one root-derived execution limit on the shared `AgentControl`. - Count active V2 subagent turns with an RAII guard owned by `RunningTask`. - Check capacity before spawning or starting an idle agent, including direct app-server `turn/start` submissions. - Preserve queued delivery for agents that are already running. - Exempt automatic idle continuations so `/goal` work is not dropped when capacity is temporarily full. - Keep root and V1 turns outside this limiter. ## Test coverage - `execution_guards_count_active_v2_subagent_turns` - `execution_guards_ignore_root_and_v1_turns` - `v2_nested_spawn_checks_shared_active_execution_capacity`

## Why `close_agent` is the wrong model-facing name for the v2 operation after the residency changes. V2 agents remain reusable by task name, and residency/unloading owns capacity management; the exposed tool should describe the action it actually performs: interrupt the target agent's current turn without making the agent unavailable for future messages or follow-up tasks. ## What changed - Rename the multi-agent v2 tool from `close_agent` to `interrupt_agent`. - Keep the v1 `close_agent` surface unchanged. - Update the v2 handler to send `Op::Interrupt`, keep interrupted agents registered, and reject root/self targets with interrupt-specific errors. - Route interrupt delivery through the existing dead-thread cleanup path so stale resident entries do not keep consuming capacity. - Update tool planning and handler tests for the new v2 surface and semantics. ## Verification Added focused coverage in: - `core/src/tools/spec_plan_tests.rs` - `core/src/tools/handlers/multi_agents_tests.rs`

## Why Multi-agent v2 residency is intended to keep only the threads that need to be live. The existing rollout resume path still walked persisted open descendants and reopened the entire descendant tree when resuming a v2 root, which turns resume into an eager reload of work that should stay unloaded until it is explicitly needed. The interrupted-agent path has a related residency issue. Interrupted agents remain open by design, so an idle interrupted resident should be eligible for eviction just like an idle completed or errored resident. Otherwise a resident set full of interrupted agents can consume every v2 slot and block later spawns or reloads with `AgentLimitReached`. ## What Changed - Return early from `resume_agent_from_rollout` after resuming a v2 thread so persisted v2 descendants are not reopened eagerly. - Treat idle `Interrupted` v2 residents as unloadable in the LRU residency path. - Add focused coverage for v2 root resume leaving descendants unloaded and for eviction of an idle interrupted v2 resident when a new slot is needed. ## Verification Added targeted `codex-core` tests covering: - v2 root resume with persisted descendants, verifying only the root is loaded after resume. - residency eviction of an idle interrupted v2 agent when the resident set is full.

## Summary - add contains_external_context() to tool output so other tools can be opted out of influencing memory when disable_on_external_context=true - Classify standalone web-search output as external context (to match behavior as hosted web search) - Verify with integration test

## Summary - Restore separate release symbol archives for macOS, Linux, and Windows binaries. - Build release binaries with `line-tables-only` debuginfo instead of full debuginfo. - Strip Unix distribution binaries after extracting symbols, preserve Windows PDBs, and keep symbol archives available to the release job. - Strip the packaged Linux `bwrap` binary before hashing it so the embedded digest matches the distributed bytes. ## Root cause The first symbol-artifact implementation enabled `CARGO_PROFILE_RELEASE_DEBUG=full`. In the June 2 release runs, macOS ARM primary builds reached the 90-minute timeout while still inside `Cargo build`. After the symbol changes were reverted, the same primary build completed in about 22 minutes. The archive step itself completed in tens of seconds when reached. Rust's `line-tables-only` debuginfo level preserves function names and source locations for symbolication without emitting the heavier variable and type information from full debuginfo. ## Validation - Ran `just fmt` from `codex-rs`. - Ran `just test-github-scripts` from the repository root: 23 tests passed. - Ran `bash -n` and `shellcheck` on `.github/scripts/archive-release-symbols-and-strip-binaries.sh`. - Parsed both modified workflows as YAML and ran `git diff --check`. - Built a macOS release smoke binary with `line-tables-only`, archived its dSYM through the restored script, stripped the production binary, and verified that `atos` resolves `symbol_smoke_function` to `main.rs:2`. - Ran Linux archive-script control-flow coverage with stubbed `objcopy` and `strip` commands. - Ran Windows PDB archive staging coverage and verified underscore-emitted Rust PDB names are staged under shipped hyphenated binary names. ## Follow-up The release workflow only runs for tags or manual dispatches, so CI cannot dry-run the full release matrix on this PR. The next release run will verify runner time and memory behavior under `line-tables-only`.

## Why Remote-control app-server sessions can reconnect every 5-7 seconds when the shared transport-event queue fills. The queue's consumer handled `ConnectionClosed` by awaiting all in-flight RPCs for the disconnected connection. A stuck RPC therefore blocked processing of replacement connection and initialize events until remote-control forwarding hit its five-second timeout and reconnected again. Related issue: N/A (internal remote-control incident investigation). ## What Changed - Split fast RPC admission closure from draining: `ConnectionRpcGate::close()` rejects queued and future RPCs, while `shutdown()` continues waiting for RPCs that already started. - Close a disconnected connection's RPC gate before spawning the existing RPC drain and resource cleanup in a tracked background task, so the transport-event consumer remains available without waiting for active RPCs. - Reap completed cleanup tasks during normal operation, drain them during graceful shutdown, and abort them during forced shutdown. - Add regression coverage for closing with an active RPC, rejecting post-close requests without polling them, and preserving the existing shutdown wait behavior. ## Verification `just test -p codex-app-server --lib connection_rpc_gate` passes all 6 tests, including the new close-versus-drain regression coverage.

## Summary - Keep the existing `x-codex-window-id` HTTP header unchanged. - Also send the same window ID in Responses `client_metadata`, allowing supported backend paths to surface it as `x-client-meta-x-codex-window-id`. - Cover normal HTTP Responses and remote compaction v2 requests without changing window generation or compaction behavior. ## Why In the `2026-06-06T23` production hour, all 28,729 HTTP compaction requests had `window_id` in `x-codex-turn-metadata`, but only 73 retained the direct `x-codex-window-id` header. The request-body `client_metadata` path is already used for installation ID and is preserved through supported Responses API paths. This is additive metadata only. It does not change the direct header, request count, model input, compaction routing, window generation, or user response behavior. Legacy `/v1/responses/compact` is intentionally unchanged. Its current server-side `CompressBody` schema does not accept `client_metadata` and rejects unknown fields, so supporting that path requires a backend schema change before the Codex client can safely send this field. ## Validation - Current head: `219baef3c`, rebased onto `origin/main` at `26d932983`. - The post-rebase diff remains limited to the original five files (`22` insertions, `6` deletions); the legacy experiment remains fully reverted. - `just test -p codex-core responses_stream_includes_subagent_header_on_review`: passed; validates normal HTTP Responses metadata. - `just test -p codex-core remote_compact_v2_reuses_compaction_trigger_for_followups`: passed; validates remote compaction v2. - `just test -p codex-core remote_manual_compact_chatgpt_auth_reuses_service_tier_and_prompt_cache_key`: passed; validates that legacy compact keeps its accepted payload shape. - `just test -p codex-core remote_manual_compact_api_auth_omits_service_tier_and_reuses_prompt_cache_key`: passed; validates the legacy API-key payload as well. - `just fmt`: passed; an unrelated root `justfile` rewrite produced by the formatter was discarded. - `git diff --check origin/main...HEAD`: passed. The focused server pytest could not start in the local monorepo environment because test setup is missing the `dotenv` module. Server source and tests explicitly show that `CompressBody` omits `client_metadata` and `/v1/responses/compact` returns HTTP 400 for unknown body fields.

## Why Compaction analytics adds retained image count and compaction summary output tokens for v1.5 specifically. ## What changed - Add nullable `retained_image_count` and `compaction_summary_tokens` fields to `codex_compaction_event`. - Populate them only for `responses_compaction_v2`: retained images come from the retained v2 compacted history, and summary tokens come from `response.completed.token_usage.output_tokens`. - Leave local and legacy remote compaction events as `null` for these detail fields. ## Verification - `just fmt` - `just fix -p codex-core` - `just test -p codex-core build_v2_compacted_history_counts_retained_input_images` - `git diff --check`

## Why Importing large external-agent session histories currently starts a full live Codex thread for every imported session. This initializes unrelated runtime systems and repeats expensive transcript, metadata, hashing, and ledger work. On a 50-session, 238 MiB fixture, the existing path took roughly 70 seconds to complete the import and 77 seconds end to end. ## What changed - Persist imported sessions directly through `ThreadStore` instead of starting full live threads. - Process imports through a bounded five-session pipeline. - Parse, extract, and hash each source file in one pass. - Move blocking source preparation onto the blocking thread pool. - Reuse prepared content hashes and update the import ledger once per batch. - Avoid metadata readback for newly written rollouts. - Preserve imported conversation history and visible thread metadata. - Keep the implementation out of `codex-core` and avoid changes to the public `ThreadStore` trait. ## Performance For the same 50-session, 238 MiB fixture: | Path | Import completion | End to end | | --- | ---: | ---: | | Existing import | 69.61s | 76.62s | | This change | 5.95s | 6.58s | All 50 sessions imported successfully with no warnings or contention signals. ## Validation - `just test -p codex-external-agent-sessions` - `just test -p codex-app-server external_agent_config_import` - Verified imports do not initialize unrelated required MCP servers. - Verified previously imported source versions are skipped and changed sources can be imported again. - Verified imported rollouts remain readable through thread listing and history APIs.

## Summary - Follow-up to #26417 and #26631 - Add `marketplaceSource` to `codex plugin marketplace list --json` entries for configured marketplaces - Reuse the existing `marketplaceSource` shape from `codex plugin list --json` - Keep human-readable marketplace list output unchanged - Add CLI coverage for configured local and git marketplace sources Example: ```json { "marketplaces": [ { "name": "debug", "root": "/path/to/.codex/.tmp/marketplaces/debug", "marketplaceSource": { "sourceType": "git", "source": "https://example.com/acme/agent-skills.git" } } ] } ``` ## Validation - `just fmt` - `just fix -p codex-cli` - `just test -p codex-cli marketplace_list` - `just test -p codex-cli`

## Why These workflows currently hard-code the `codex` runner group and custom runner labels. That makes the same workflow definitions less portable across repository copies or renamed repos, even though the runner fleet follows the repository name scheme. Template the runner identities from the repository name so `openai/codex` still resolves to the existing `codex-*` runners while other repos can use their own `<repo>-*` runner names. ## What Changed - Replaced custom runner `group` values such as `codex-runners` with `${{ github.event.repository.name }}-runners`. - Replaced custom runner labels such as `codex-linux-x64` and `codex-windows-arm64` with `${{ github.event.repository.name }}-...`. - Covered direct `runs-on` objects, matrix `runs_on` entries, reusable workflow runner inputs, and release runner labels. ## Verification - Parsed all `.github/workflows/*.yml` files as YAML with Ruby. - Searched `.github/workflows` to confirm no hardcoded runner-field `codex-runners` or `codex-*` labels remain.

## Why Auto Review should remain the effective approval reviewer when settings cross runtime boundaries. A config or app-server round trip must not change the reviewer identity, and delegated work must not silently fall back to user review. This requires both a stable canonical serialized value and propagation of the effective setting. `auto_review` is the canonical value across protocol and app-server output, while `guardian_subagent` remains accepted as backward-compatible input. ## What changed - serialize `ApprovalsReviewer::AutoReview` consistently as `auto_review` across core protocol and app-server v2 - continue accepting `guardian_subagent` when reading existing config or client requests - carry the active turn's approval reviewer into spawned agents - update config/debug expectations and add delegated-task regression coverage ## Scope This does not change Guardian policy or remove compatibility with existing `guardian_subagent` inputs. It preserves the selected reviewer across serialization, config reloads, app-server settings, and delegated task setup. Related Guardian changes are split independently: - #26231 adds denials and soft denials - #26334 retries transient reviewer failures - #26333 reuses narrowly scoped low-risk approvals - #26232 adds TUI denial recovery ## Validation - `just test -p codex-app-server-protocol` (224 passed) - regression coverage for delegated task reviewer propagation - serialization coverage for canonical `auto_review` output and legacy `guardian_subagent` input --------- Co-authored-by: saud-oai <saud@openai.com>

## Stack 1. [1 of 3] Support long raw TUI goal objectives - #27508 2. **[2 of 3] Support long pasted text in TUI goals** - this PR 3. [3 of 3] Support images in TUI goals - #27510 ## Why Large text pasted into the TUI composer is represented as a paste placeholder plus pending paste metadata. For `/goal`, preserving only the visible placeholder is not enough: the agent would see a short placeholder string instead of the actual pasted text, and the long-text support from the first PR would never see the payload. The TUI also needs to avoid writing stale sidecar files when a user pastes a large block and then deletes its placeholder before submitting the goal. ## What Changed - Introduces a TUI `GoalDraft` for goal submissions so `/goal`, `/goal edit`, and queued goal commands can carry objective text plus text elements and pending paste payloads. - Materializes active pasted-text placeholders to `pasted-text-N.txt` files through the app-server filesystem path introduced in #27508. - Rewrites active paste placeholders in the persisted objective to file references, while leaving literal placeholder-looking text alone. - Filters out deleted paste placeholders so otherwise-small goals do not require `$CODEX_HOME` or remote filesystem writes. - Preserves pending paste metadata when a `/goal` command is queued before a thread exists. ## Verification - Added goal materialization tests for active paste placeholders, deleted paste placeholders, and whitespace-only paste payloads. - Added/updated TUI slash-command tests for large pasted text, queued `/goal` commands before thread start, and queued oversized goal behavior. ## Manual Testing - Used real terminal bracketed-paste sequences through a remote TUI session. A 1,228-byte multiline paste became `pasted-text-1.txt`; its first/last lines and byte count matched exactly, and the persisted objective referenced the server-host path. - Pasted a large block, deleted its placeholder, and submitted a small replacement objective. No new directory or sidecar file was created. - Added two same-length large pastes to one goal. The composer disambiguated their visible placeholders, and materialization preserved order and contents in `pasted-text-1.txt` and `pasted-text-2.txt`. - Submitted a whitespace-only large paste and verified the goal was rejected as empty without writing a file. - Submitted a pasted-text replacement while another goal was active, verified no file was written before confirmation, then canceled and confirmed the original goal remained unchanged. - Combined a large paste with enough raw text to exceed 4,000 characters after placeholder rewriting. The paste sidecar and `goal-objective.md` were created in the same remote attachment directory, and `/goal edit` restored the rewritten objective with its sidecar reference.

## Why We need request-level evidence for Guardian cases where `codex-auto-review` is missing from the client-side model catalog and the review falls back to the parent model. ## What changed - Add `guardian_catalog_contains_auto_review` to Guardian Responses API client metadata. - Add `guardian_model_provider_id` to Guardian Responses API client metadata. - Keep review-session metadata optional so callers without metadata preserve the existing `None` path. - Add tests for override, normal preferred-model, and missing-auto-review-catalog behavior. ## Validation - `just test -p codex-core guardian_review_records_missing_auto_review_model_in_request_metadata` - `just test -p codex-core guardian_review_uses_model_catalog_override_when_preferred_review_model_exists` - `just test -p codex-core guardian_review_uses_preferred_review_model_without_model_catalog_override` - `git diff --check origin/main`

Codex seems to do interesting things with `cfg`'s sometimes and it seems it would be good to give it guidance about how broadly our Rust needs to work. This adds a very brief section to AGENTS.md explaining that we target the major desktop OSes and that we want the vast majority of our logic to be portable across them.

Each Windows packaging job creates three compressed forms of five binaries in sequence. This takes roughly two minutes and is on the release critical path. Use two xargs workers to compress independent binaries concurrently. The workers only read the raw executables and write per-binary archive names. The Codex zip can safely read the helper executables while their own archives are generated. On a 16-vCPU AMD EPYC 9V74 Windows x64 release runner, alternating trials against artifacts from release run 27391514823 measured: serial: 121 s, 123 s, 121 s parallel: 73 s, 73 s, 74 s This saves 47 to 50 seconds in the x64 packaging lane, reducing the observed release critical path by about 48 seconds when x64 remains the limiting lane. https://github.com/openai/codex/actions/runs/27401905938

…27499) ## Summary This PR promotes Mentions 2.0 (unified TUI mention popup) to stable and enables it by default. - Keep `mentions_v2` as a temporary rollback path to the legacy split popups (`--disable mentions_v2`). - Add feature-default and snapshot coverage for the default experience. ## Prior work - [#19068 — Unified mentions in TUI](#19068) - [#22375 — Use plugin/list to get plugins for mentions](#22375) - [#23363 — Unified mentions tweaks and rendering polish](#23363) ## Test plan - Launch Codex without any feature overrides. - Type `@` in the TUI composer. - Confirm the unified mentions menu opens and displays filesystem, plugin, and skill results.

## Why The paired thread-environment migration changed several generic test turn helpers from supplying a fallback cwd to explicitly selecting the local environment. That changes their meaning under `build_with_remote_env()`: remote-only fixtures cannot resolve the forced local selection, so the tests fail before exercising apply-patch, RMCP, unified-exec, or view-image behavior. Generic helpers should inherit the environment selected by their fixture. Tests that intentionally exercise local routing continue to select the local environment explicitly. ## What changed - Remove forced `local_selections(...)` overrides from the generic apply-patch, RMCP, unified-exec, and view-image turn helpers. - Remove the imports made unused by those deletions. ## Testing - Not run locally; this is a test-fixture-only change and the `full-ci` branch will exercise the affected remote shards.

In the x64 packaging job from https://github.com/openai/codex/actions/runs/27391514823 archiving and uploading PDBs took 65 seconds after signing. Release packaging could not start until that work completed. Windows code signing changes executables but not their PDBs. Package the PDBs in a sibling Ubuntu job as soon as all binary artifacts are available. Signing and release packaging can then proceed without waiting for the symbols archive, reducing the critical path by about one minute.

## Why `PathUri::from_abs_path` can fail for absolute paths that do not have a normal `file:` URI representation, forcing filesystem call sites to handle a conversion error even though the original path can be preserved losslessly. ## What Make `from_abs_path` infallible and migrate its callers. Unrepresentable paths use `file:///%00/bad/path/<base64>`, encoding Unix bytes or Windows UTF-16LE; `to_abs_path` validates and decodes that fallback. The leading encoded null reserves a namespace that cannot collide with a real Unix or Windows path, and fallback URIs remain opaque to lexical path operations. ## Validation Added path-URI coverage for Unix null and non-UTF-8 paths, Windows device/verbatim and non-Unicode paths, serialization, malformed fallbacks, opaque lexical operations, invalid native payloads, and literal `/bad/path` collision resistance.

In the Windows x64 packaging job from https://github.com/openai/codex/actions/runs/27391514823 building the primary and app-server package archives serially took 116 seconds. Both archives read the same signed-binary directory but write separate package trees and output files. Run them concurrently with xargs -P2. The package helper rewrites DotSlash executables under the process temp directory. A naive concurrent run failed when one process tried to replace an executable used by the other. Give each bundle separate TMP and TEMP roots to keep those caches independent. On Windows x64 in https://github.com/openai/codex/actions/runs/27397197944 three serial trials took 127, 128, and 126 seconds. Concurrent trials took 76, 74, and 74 seconds, saving 52 to 54 seconds. This removes about 50 seconds from the release critical path without changing the packaging commands or output set.

In the release job from https://github.com/openai/codex/actions/runs/27391514823 staging the nine npm release tarballs serially took 104 seconds. Each package build writes to a separate staging directory, output path, and npm cache. Run them through the script's existing thread pool, bounded by the available CPU count. Delete each staging tree as its build finishes so concurrency does not retain all copies until the end. On ubuntu-24.04 in https://github.com/openai/codex/actions/runs/27397232050 two serial trials took 103 and 101 seconds, while concurrent trials both took 41 seconds. Comparing every extracted file from the first serial and concurrent sets found no differences. This removes about one minute from every release.

## Why We have some large gaps in our thread start, resume, and pre-sampling traces that make it hard to tell where latency is coming from. ## What Changed - Added coarse spans around thread start/resume, turn context construction, rollout reconstruction, skill/plugin loading, and tool preparation. - Added a breakdown of discoverable-tool preparation across connector loading, plugin discovery, and local plugin details. ## Testing - `cargo check -p codex-app-server -p codex-core -p codex-core-skills -p codex-core-plugins` - Built the app-server locally and exercised thread start, first turn, follow-up turn, server restart, thread resume, and a resumed turn.

## Stack 1. [1 of 3] Support long raw TUI goal objectives - #27508 2. [2 of 3] Support long pasted text in TUI goals - #27509 3. **[3 of 3] Support images in TUI goals** - this PR ## Why The first two PRs make goal definitions resilient to long text, but `/goal` still dropped image inputs from the composer. That meant a user could attach images while defining a goal and the resulting goal continuation would not have any useful reference to those images. Goal state still persists only objective text, so image inputs need to become paths or URLs that the agent can read later. ## What Changed - Extends TUI `GoalDraft` with local image attachments and remote image URLs. - Copies local goal images through the app-server filesystem layer into the managed goal attachment directory, then rewrites active image placeholders to file references. - Appends unplaced local images and remote image URLs to the objective as referenced image files or URLs. - Preserves goal image metadata through live `/goal` submission and queued `/goal` dispatch. ## Verification - Added goal materialization coverage for local image files and remote image URLs. - Added/updated TUI slash-command coverage showing `/goal` drafts include attached images instead of dropping them. ## Manual Testing - Attached an image by bracketed-pasting its local path into a live `/goal` composer. The `[Image #1]` placeholder became a server-host `image-1.png` reference, copied bytes matched exactly, and no attachment was written under the TUI's local home. - Deleted an image placeholder before submitting a small goal and verified no image was copied. - Attached PNG and JPEG files to the same goal. Placeholder order was preserved as `image-1.png` and `image-2.jpg`, and both remote copies matched their source bytes. - Tried extensionless, malformed-extension, and extension/content-mismatched paths; the composer rejected them as image attachments before goal dispatch rather than creating misleading managed image files. - Combined a local image, a large pasted block, and enough raw text to exceed 4,000 characters. The remote attachment directory contained the image, paste sidecar, and `goal-objective.md`; all embedded references used server-host paths and both payloads matched their sources. - Submitted an image replacement while a goal was active, verified no image was copied before confirmation, then canceled and confirmed the attachment count was unchanged.

## Why [#25345](#25345) was approved, green, and squash-merged into its stacked base branch, `fcoury/tokenmaxxing-api`. Four minutes later, that base branch was force-pushed back to an API-only rebased head while preparing [#25344](#25344) for `main`. As a result, the squash commit from #25345 was orphaned and the TUI command never reached `main` or a release. This PR relands the orphaned TUI change from [`411410b8`](411410b) on current `main`. ## What changed - Add `/usage`, `/usage daily`, `/usage weekly`, and `/usage cumulative` for account token activity. - Fetch account usage asynchronously through the existing `account/usage/read` app-server RPC. - Render daily, weekly, and cumulative activity with theme-aware terminal palettes and bounded transient cards. - Preserve transcript ordering while assistant streams, history consolidations, active cells, and hooks complete. - Hide `/usage` from completion when backend auth is unavailable while keeping typed-command guidance. - Carry current-main behavior forward for cwd-aware Markdown parsing, Windows Terminal color detection, and personal access token auth. - Clear pending usage cards on thread rollback and delay completed cards until live hook output is committed. - Add focused regression and snapshot coverage for loading, auth errors, invalid views, rollback, hook ordering, layout, and charts. ## Prior review The original implementation was approved by Eric Traut in #25345 after testing multiple themes and light/dark terminals. This PR preserves that reviewed implementation while adapting it to current `main` and adding regression coverage for newer rollback and hook lifecycle behavior. ## Validation - `just test -p codex-tui token_activity palette renderable usage_command` — 37 passed. - Focused rollback, hook-ordering, and error snapshot tests — 4 passed. - `just fix -p codex-tui` — passed. - `UV_CACHE_DIR=/private/tmp/codex-uv-cache just fmt` — passed. - `cargo insta pending-snapshots` — no pending snapshots. - `just test -p codex-tui` — 2,870 passed; two unrelated guardian feature-flag tests failed because their expected `OverrideTurnContext` event was absent: - `update_feature_flags_disabling_guardian_clears_manual_review_policy_without_history` - `update_feature_flags_disabling_guardian_clears_review_policy_and_restores_default` - `just argument-comment-lint` could not complete because the local Bazel LLVM `compiler-rt` repository is missing `include/sanitizer/*.h`. The touched Rust diff was manually inspected and no missing opaque-literal argument comments were found.

## Summary - Keep local plugin suggestions bounded to fallback and explicitly configured plugins. - Preserve app-overlap recommendations for remote plugins using cached catalog metadata. - Remove the WSL-specific local discovery exception and move manager-owned discovery tests into `codex-core-plugins`. ## Why Local curated marketplaces were allowlisted before plugin detail loading, so every uninstalled candidate could be deep-read before its app IDs were checked. That caused per-turn reads of candidate plugin manifests, skills, app configs, hooks, and MCP configs, which is especially expensive on slow disks. Remote discovery does not need those local candidate reads because app IDs are already available in the cached remote catalog. Installed local plugins are still loaded when needed to determine the user's installed app IDs. ## Validation - `just fmt` - `just test -p codex-core-plugins discoverable::tests` (13 passed) - `just test -p codex-core plugins::discoverable::tests` (4 passed) - `just bazel-lock-update` - `just bazel-lock-check` - `git diff --check`

## Context Plugins can expose more than one way for Codex to use them: App connectors for ChatGPT/SIWC-backed sessions and MCP servers for API key login sessions. The broader goal is to make `PluginsManager` the place that understands which plugin surfaces should be visible for the current auth route, so callers do not each have to make that decision themselves. This PR is the small setup step for that work. It lets the plugin manager be created with the current `AuthMode`, which gives the followup auth routing PRs the information they need without relying on setter injection. ## Stack - PR1: #27652 seed plugin manager auth at construction. - PR2: #27459 route plugin surfaces by auth mode. - PR3: #27607 dedupe plugin MCP servers by App declaration name. - PR4: #27602 preserve plugin Apps in connector listings. - PR5: #27461 skip install-time plugin MCP OAuth for matching App routes. ## Summary - Let `PluginsManager::new_with_restriction_product` accept an initial `AuthMode`. - Keep `PluginsManager::new` behavior unchanged for ordinary callers. ## Validation ```bash cargo test -p codex-core-plugins plugins_manager_tracks_auth_mode cargo test -p codex-core list_tool_suggest_discoverable_plugins git diff --check ``` --------- Co-authored-by: Xin Lin <xl@openai.com>

## Why We want to make it possible for an app-server orchestrator on one OS to control an exec-server on another host running a different OS. In practice this kinda already works if you get lucky and the two hosts have the same path format, but we mangle quite a lot of operations if either end is Windows. We should be able to test the cross-platform interactions for exec-server, but we want to do this fairly soon and need a lightweight option for testing. Using Wine to run the Windows side is far from perfect, but it should give us a decent measure of how well we're handling the basics of paths, process spawning, shell interaction, etc. Future changes will add actual exec-server tests and possibly extensions to the Wine testing environment. ## What To make the cross-target-triple build easy, these tests are added only to the Bazel build. This change adds an x86_64 Wine prebuilt managed by Bazel and some build rules that can set up the needed toolchain transition. The support library for running Wine in a test environment created by the Bazel rules comes with its own basic unit and integration tests. Their primary priority is to make sure we don't leak child processes on developer machines and that we can build and launch a basic hello world binary. ## Validation Confirmed these new tests are running on the [x86_64 bazel ubuntu jobs](https://github.com/openai/codex/actions/runs/27446432302/job/81132356855?pr=27937): ``` //bazel/rules/testing/wine:wine-smoke-test (cached) PASSED in 3.7s //bazel/rules/testing/wine:wine-test-support-unit-tests (cached) PASSED in 15.8s ```

## Context Some plugins expose both Apps and MCP servers. This PR moves auth-aware surface projection into `core-plugins::PluginsManager`, so callers get a consistent effective plugin view. Later PRs narrow the conflict rule and update listing/install paths. The high level goal of this PR is to set up the plumbing to conditionally filter App/MCP in the plugin manager layer. We start by removing MCP servers when using SIWC/Codex-backend auth, and removing Apps when using API-key-style auth. This PR is now stacked on #27652, which contains only the constructor plumbing for seeding `PluginsManager` with the current auth mode. ## Stack - PR1: #27652 seed plugin manager auth at construction. - PR2: #27459 route plugin surfaces by auth mode. - PR3: #27607 dedupe plugin MCP servers by App declaration name. - PR4: #27602 preserve plugin Apps in connector listings. - PR5: #27461 skip install-time plugin MCP OAuth for matching App routes. ## Summary - API-key/non-ChatGPT routes hide plugin Apps and keep plugin MCPs. - ChatGPT/SIWC with Apps enabled keeps plugin Apps and suppresses MCPs for dual-surface plugins. - MCP-only plugins stay available for ChatGPT/SIWC sessions. - Cached plugin load outcomes are re-projected when auth mode changes. ## Validation ```bash cargo test -p codex-core-plugins plugin_auth_projection cargo test -p codex-core list_tool_suggest_discoverable_plugins git diff --check ```

## Why Managed deployments need a reliable deny gate for remote control. Persisted enablement and explicit startup requests currently remain able to start the transport, while the removed `features.remote_control` key is intentionally only a compatibility no-op. This adds a dedicated requirement that administrators can use to force remote control off without deleting the user's persisted preference. Removing the requirement and restarting restores the prior choice. ## What Changed - Added top-level `allow_remote_control` requirements parsing, sourced layer precedence, debug output, and `configRequirements/read` exposure as `allowRemoteControl`. - Added a typed transport policy captured from the startup requirements snapshot. Managed disable forces the initial state to disabled and prevents enrollment, refresh, connection, and persisted-preference mutation. - Rejected every `remoteControl/*` RPC before parameter deserialization with JSON-RPC `-32600` and `remote control is disabled by managed requirements`. - Preserved the existing disabled status notification and the previous behavior when the requirement is `true` or omitted. - Regenerated app-server protocol schemas and documented the new requirement. ## Verification - Confirmed all remote-control RPCs, including a malformed request, return the managed-policy error while the initial status notification remains `disabled`. - Confirmed explicit ephemeral startup and persisted enablement make no backend connection and leave the SQLite preference unchanged. - Confirmed `allow_remote_control = true` does not enable or block remote control and `configRequirements/read` returns `allowRemoteControl: false` for the deny policy. Related issue: N/A (managed-policy hardening).

## Why We want to make it possible for an app-server orchestrator on one OS to control an exec-server on another host running a different OS. In practice this kinda already works if you get lucky and the two hosts have the same path format, but we mangle quite a lot of operations if either end is Windows. This test starts exercising that interaction, although right now the initial bootstrap fails. Future changes will expand the test's assertions to match improved support. ## What Stacked on #27964. This adds a small Windows exec-server fixture and a Linux protocol smoke test using the reusable Wine harness, covering Windows environment discovery, non-TTY `cmd.exe` execution, output, exit status, and working directory. Once we've got the full codex binary cross-building under Bazel we could consider moving to the real binary instead of the stripped down exec-server-only binary used here.

## Context Turn state is scoped to one logical turn, but the WebSocket path currently exchanges it through upgrade headers, which are scoped to the physical connection. A connection may be reused across turns, so its handshake cannot represent the turn lifecycle reliably. ## Change Exchange turn state on each WebSocket response request instead: - send an established value in `response.create.client_metadata` - read the returned value from the existing `response.metadata` event - retain the first value in the turn-scoped `ModelClientSession` `OnceLock` - start the next logical turn without state, even when it reuses the same WebSocket connection This gives WebSocket requests the same first-value-wins contract as the existing HTTP path. ## Test plan Integration coverage verifies that: - WebSocket replays returned state on same-turn follow-ups - later response metadata does not replace the first value - state resets at the logical turn boundary without requiring a reconnect CI validates the full change. ## Stack This is 1/2. #28002 builds on this request-scoped transport to carry established state through compact requests.

## Context Inline compaction is part of the active logical turn. Compact requests and the sampling requests around them should use the same turn state, including when compaction is the first request to establish it. ## Change Pass the turn-scoped `OnceLock` directly to inline v1 compaction so `/responses/compact` includes an established value in the existing HTTP header. Capture `x-codex-turn-state` from the compact response into that same lock, allowing pre-turn compact to establish the value that subsequent sampling reuses. V2 compact already uses the normal Responses HTTP/WebSocket path and continues to share the same `OnceLock` without separate plumbing. The first returned value wins for the logical turn. ## Test plan Integration coverage verifies that: - pre-turn v1 compact can establish state for the first sampling request - inline v1 compact receives established state over HTTP - inline v2 compact reuses established state over HTTP - inline v2 compact reuses established state over WebSocket CI validates the full change.

The first release after parallelizing Windows packaging moved the critical path to the ARM64 packaging job: https://github.com/openai/codex/actions/runs/27451157324 The x64 job started immediately and finished in 5m29s. The ARM64 job waited 76s for its runner and then took 5m56s, holding the release for 1m43s after x64 had finished. Packaging only downloads, signs, archives, and compresses already built binaries. It does not execute target code. Run both packaging jobs on x64 runners, keeping ARM64 hardware for compilation.

## Why This is the second-to-last place in the exec-server protocol that needs to migrate to URIs to support cross-OS operation. ## What - Change `ExecParams.cwd` to `PathUri`. - Keep the cwd URI-shaped through core and rmcp producers, converting it to `AbsolutePathBuf` only in `LocalProcess::start_process`. - Reject non-native cwd URIs before launch and update the affected protocol documentation and call sites.

## Context This is the next step in the plugin auth-routing stack. The earlier PRs make `PluginsManager` auth-aware and move the broad App/MCP surface decision into that layer. This PR narrows the ChatGPT/SIWC behavior so we only hide a plugin MCP server when it conflicts with an App declaration of the same name. In product terms: if a plugin exposes both an App route and MCP route for `foo`, ChatGPT/SIWC sessions should use the App route for `foo`. If the same plugin also exposes a separate MCP server like `foo2`, that MCP server should remain available. ```json // .app.json { "apps": { "foo": { "id": "connector_abc" } } } ``` ```json // .mcp.json { "mcpServers": { "foo": { "url": "https://mcp.foo.com/mcp" }, "foo2": { "url": "https://mcp.foo2.com/mcp" } } } ``` ## Stack - PR1: #27652 seed plugin manager auth at construction. - PR2: #27459 route plugin surfaces by auth mode. - PR3: #27607 dedupe plugin MCP servers by App declaration name. - PR4: #27602 preserve plugin Apps in connector listings. - PR5: #27461 skip install-time plugin MCP OAuth for matching App routes. ## Summary - Preserve App declaration names in loaded plugin metadata. - Keep public effective App outputs as deduped connector IDs for existing callers. - For ChatGPT/SIWC, suppress only plugin MCP servers whose names match declared App names. ## Validation ```bash cargo fmt --all cargo test -p codex-core-plugins plugin_auth_projection cargo test -p codex-core-plugins effective_apps cargo test -p codex-core-plugins read_plugin_for_config_installed_git_source_reads_from_cache_without_cloning cargo test -p codex-core explicit_plugin_mentions_use_apps_for_chatgpt_dual_surface_plugins cargo test -p codex-core explicit_plugin_mentions_keep_non_conflicting_mcp_for_chatgpt_auth cargo test -p codex-app-server --test all plugin_install_filters_disallowed_apps_needing_auth git diff --check ``` --------- Co-authored-by: Xin Lin <xl@openai.com>

## Summary Prevent dependency refreshes from silently downgrading Codex's bundled SQLite to a release affected by the WAL-reset corruption bug. SQLx 0.9 accepts a broad `libsqlite3-sys` range. An unrelated lock refresh therefore moved Codex from `libsqlite3-sys 0.37.0` back to `0.35.0`, changing the bundled SQLite runtime from 3.51.3 to 3.50.2. SQLite documents the affected versions and fix in [The WAL Reset Bug](https://www.sqlite.org/wal.html#the_wal_reset_bug) and the [SQLite 3.51.3 changelog](https://www.sqlite.org/changes.html#version_3_51_3).

## Intent Keep Bazel and Starlark files consistently formatted without requiring contributors to install or version buildifier themselves. ## Implementation - Add a SHA-256-pinned, cross-platform DotSlash manifest for buildifier v8.5.1. - Run buildifier from the shared `just fmt` and `just fmt-check` driver, with Windows-safe explicit DotSlash invocation. - Provision DotSlash in formatting CI and contributor devcontainers, and document the source-build prerequisite. - Apply the initial mechanical buildifier formatting baseline.

## Why Cross-OS tests in the wine environment will be much more faithful if we can also test powershell integration. ## What Add an x86_64 powershell binary to the bazel wine environment and include smoke tests.

## Why We're moving to `PathUri` in more places to support cross-OS app-server/exec-server, but we don't want to expose the URI encoding to users of app-server's public APIs yet. We'll need to translate at the app-server API boundary between client-visible "regular" paths that are appropriate for the OS of the environment for which the paths make sense, which means using the environment's path personality to do the conversion. `PathUri` doesn't yet attempt to encode environment ID, so for now we'll sniff the most likely path convention for a given path. ## What - Add `PathConvention` and `NativePathString` with host-independent POSIX, Windows drive, and UNC rendering. - Cover cross-host rendering, encoding, Unicode, invalid components.

## Why Next slice needed to make progress on the `remote_env_windows` test is to support passing a Windows cwd for the remote environment and using that environment's native shell. This lets the test run a real Windows process instead of only recording an early path or shell mismatch. ## What - change `TurnEnvironmentSelection.cwd` from `AbsolutePathBuf` to `PathUri` - convert local cwd values to URIs when constructing selections - preserve a remote primary cwd instead of replacing it with the local legacy fallback - prefer the selected environment's discovered shell for unified exec, falling back to the session shell when unavailable - convert back to a host-native absolute path at current native-only consumer boundaries - reject or deny unsupported foreign cwd values at the existing request-permissions boundary, with TODOs for its future migration - extend the hermetic Wine test to execute Windows PowerShell in `C:\windows` and verify successful process completion - record the current app-server rejection against the same Wine-backed remote Windows fixture when its cwd is supplied as a native Windows path

## Why Clients that display or coordinate spawned subagents need an authoritative snapshot of a thread's immediate spawned children when they connect to app-server or recover after missing live events. `thread/list` cannot query by parent, so clients must otherwise scan unrelated threads or reconstruct relationships from rollout history and transient events. The direct spawn relationship already exists in persisted `thread_spawn_edges` state. Review and Guardian threads do not participate in that lifecycle and are intentionally outside this filter's scope. ## What changed This adds an experimental `parentThreadId` filter to `thread/list`. Parent-filtered requests return direct spawned children from persisted state while preserving the existing response shape, explicit filters, sorting, and timestamp-only cursor behavior. The lookup does not read rollout transcripts or recursively return descendants. Supersedes #25112 with the narrower `thread/list` filter approach. ## How it works 1. An experimental client passes a valid thread ID as `parentThreadId`. 2. App-server routes the list through the existing thread-store and state-database boundaries. 3. SQLite selects threads whose IDs have a direct persisted spawn edge from that parent. 4. Omitted provider and source filters include all values; explicit filters keep ordinary `thread/list` semantics. 5. Grandchildren, Review threads, and Guardian threads are excluded. ## Verification State (144 tests), rollout (69 tests), and focused app-server thread-list (31 tests) suites passed. Scoped Clippy checks and repository formatting also passed. Coverage includes direct spawned children, omitted grandchildren, pagination, malformed IDs, mixed source kinds, explicit filters, and operation without rollout files.

pull Bot locked and limited conversation to collaborators Mar 12, 2026

pull Bot added ⤵️ pull merge-conflict Sync PR has merge conflicts labels Mar 12, 2026

rka-oai and others added 27 commits June 6, 2026 01:01

fix(core-plugins): send Codex product SKU to plugin-service (#26804)

cdc8ec0

build(v8): update rusty_v8 to 149.2.0 (#26464)

b89ce9a

deps: update starlark to 0.14.2 (#24820)

e648ec7

etraut-openai and others added 30 commits June 12, 2026 15:34

bazel: add PowerShell to Wine test harness (#28120)

42dec90

## Why Cross-OS tests in the wine environment will be much more faithful if we can also test powershell integration. ## What Add an x86_64 powershell binary to the bazel wine environment and include smoke tests.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] main from openai:main#58

[pull] main from openai:main#58
pull[bot] wants to merge 3048 commits into
kontext-security:mainfrom
openai:main

pull Bot commented Mar 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

pull Bot commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

pull Bot commented Mar 12, 2026 •

edited

Loading