codexapi agent is a long-term fire-and-forget orchestration layer built on top
of the existing agent, task, science, and lead primitives.
The V1 goal is not to invent a new kind of coding agent. The goal is to make a durable agent that can:
- keep working for days
- survive sleep, reboot, and missed scheduler runs
- be inspected and controlled from the CLI
- accept messages while it is running
- delegate coding work to
codexapi taskorcodexapi science - escalate to the user when needed
The design is intentionally simple. It uses durable filesystem state plus one
periodic scheduler entry per CODEXAPI_HOME.
V1 does not try to solve everything.
- No daemon is required.
- No SSH is required.
- No cross-host migration or "teleportation" of running agents.
- No separate task-agent and watcher-agent runtimes.
- No catch-up replay of missed heartbeat ticks.
- No dependence on real cron in automated tests.
- No shared append-only logs written by multiple hosts.
These are deliberate omissions. They keep the system small, portable, and easy to reason about.
An agent is a durable record plus a periodic wake mechanism.
- There is one agent type.
- Each agent has a
stop_policy. - Each agent belongs to exactly one
CODEXAPI_HOME. - Each agent is owned by exactly one hostname.
- Only the owning hostname may wake and run the agent.
- Any host that can see the shared filesystem may inspect the agent and queue commands for it.
The agent's durable truth is the state stored under CODEXAPI_HOME, not a live
backend process. Each wake starts a fresh backend process and resumes from the
saved thread id when available.
CODEXAPI_HOME is the root of a complete agent control plane.
Default:
~/.codexapi
Override:
CODEXAPI_HOME=/path/to/home
Why this exists:
- It isolates live state from tests.
- It allows multiple independent codexapi installations on one machine.
- It allows a shared filesystem setup without forcing all state into one global namespace.
Two different CODEXAPI_HOME values are two different systems. They do not see
each other's agents, locks, scheduler wrappers, or cron entries.
CODEXAPI_HOSTNAME overrides the host identity used for agent ownership and
host-specific locks.
Default:
- use the process hostname reported by the OS
Override:
CODEXAPI_HOSTNAME=stable-hostname
Why this exists:
- Some shells, cron environments, test harnesses, or sandboxes report different hostnames for the same machine.
- Agent ownership depends on an exact hostname match.
- Tests and sandboxed runs need a stable explicit value.
Each agent stores at least:
id: stable identifiername: human-readable unique name within the homecreated_at: UTC timestampcreated_by: user name or parent agent nameparent_id: parent agent id, if anyhostname: owning host for executioncwd: working directoryprompt: original instruction textstop_policy:until_doneoruntil_stoppedstatus: current lifecycle statethread_id: backend resume id, or emptyheartbeat_minutes: heartbeat intervallast_wake_at: last attempted wake time in UTClast_success_at: last completed wake time in UTCnext_wake_at: next heartbeat due time in UTCwake_requested_at: durable "run soon" flag for queued commands/messagesunread_message_count: messages not yet folded into a wakeinput_tokensoutput_tokenstotal_tokensavg_tokens_per_hourchild_idslast_error: most recent failure summary, if anyactivity: short status text foragent list
V1 uses one agent type with one explicit lifecycle hint:
stop_policy=until_done: agent is expected to decide when it is finishedstop_policy=until_stopped: agent is expected to keep running until stopped
This keeps the runtime unified while preserving a small but important semantic difference for scheduling and UI.
V1 keeps the state model small:
ready: can be woken when duerunning: a wake is currently in progresspaused: do not wake until resumeddone: completed by the agent's own judgmentcanceled: stopped by an explicit commanderror: last wake failed and the agent needs attention or another wake
Why these states:
readyandrunningare enough for normal operationpaused,done, andcanceledare user-visible terminal or semi-terminal control stateserrormakes failures explicit without inventing a richer failure taxonomy
All paths below are relative to CODEXAPI_HOME.
agents/
<agent_id>/
meta.json
state.json
AGENTBOOK.md
commands/
new/
claimed/
hosts/
<hostname>/
session.json
run.lock
runs/
locks/
.tick.<hostname>.lock
bin/
agent-tick
cron/
agent.cron
Purpose:
- Stable identity and configuration.
Writer:
- Owner host only after agent creation, except for explicit configuration changes.
Readers:
- Any host.
Format:
- JSON object.
Why it exists:
- Separates mostly-static configuration from rapidly changing state.
Suggested contents:
id,name,created_at,created_by,parent_id,hostname,cwd,prompt,stop_policy,heartbeat_minutes
Purpose:
- Current snapshot for CLI inspection.
Writer:
- Owner host only.
Readers:
- Any host.
Format:
- JSON object rewritten atomically with temp file + rename.
Why it exists:
agent listandagent showshould not need to reconstruct state from many files or logs.
Suggested contents:
status,thread_id,last_wake_at,last_success_at,next_wake_at,wake_requested_at,unread_message_count, token totals,activity,last_error,child_ids
Purpose:
- Human-readable working memory for the agent, similar to the leadbook.
Writer:
- Owner host only.
Readers:
- Any host.
Format:
- Markdown.
Why it exists:
- Thread ids are not sufficient durable memory. The book is the portable, inspectable memory surface.
Suggested shape:
- A stable header with the agent's purpose, values, original goal, and standing guidance.
- A dated working-notes section where the agent updates its current plan, active tasks, unexpected developments, wider frame, curiosities, risks, and next move.
Wake behavior:
- The wake prompt should preserve the stable header and the latest notes, rather than repeatedly truncating from the top of the file and hiding recent state.
Purpose:
- Durable cross-host command spool.
Writer:
- Any host may create new files here.
Readers:
- Owner host only for processing, any host for debugging.
Format:
- One JSON file per command.
Why it exists:
- It avoids shared append logs and avoids requiring SSH or direct host reachability.
Filename rule:
<utc>.<origin-host>.<pid>.<random>.json
Writers must:
- write to a temp file in the same directory tree
fsyncif practical- rename atomically into
commands/new/
Supported V1 commands:
sendwakepauseresumecancel
Purpose:
- Temporary processing area for commands taken by the owner host.
Writer:
- Owner host only.
Readers:
- Mainly owner host; other hosts may inspect for debugging.
Format:
- Same JSON command files, moved from
new/.
Why it exists:
- Claim-by-rename is simple, durable, and avoids double processing.
After a claimed command is applied, the owner host should record the outcome in
state.json or a run record and then remove the command file. The command file
is transport, not long-term audit storage.
Purpose:
- Host-local runtime data for the owner host.
Writer:
- Owner host only.
Readers:
- Mostly owner host.
Format:
- JSON object.
Why it exists:
- Keeps the liveliest mutable runtime fields under a host-specific path.
Suggested contents:
thread_id- environment snapshot used for execution
- last run metadata that does not need to be duplicated in
state.json
Purpose:
- Non-blocking per-agent run lock.
Writer:
- Owner host only.
Readers:
- Owner host only in normal operation.
Format:
- Permanent lock file used with
flockorfcntl.
Why it exists:
- Prevents two entry points from resuming the same backend thread at the same time.
Purpose:
- Per-wake run records for debugging and recovery.
Writer:
- Owner host only.
Readers:
- Any host.
Format:
- One JSON file per wake.
Why it exists:
- Per-run files are easier to inspect and safer than multi-host append logs.
Suggested contents:
- start and end times
- reason for wake
- commands consumed
- agent reply text or status payload intended for the CLI
- token deltas
- result summary
- error details if any
Purpose:
- Stable wrapper script for cron.
Writer:
codexapi agent install-cron
Readers:
- Cron and the user.
Format:
- Executable shell script.
Why it exists:
- Cron has a sparse environment. The wrapper pins the interpreter and exports a safe environment.
The wrapper should:
- export the resolved
CODEXAPI_HOME - export the resolved
CODEXAPI_HOSTNAME - set a safe
PATH - invoke the exact Python interpreter or installed
codexapipath discovered at install time
Purpose:
- Record of the cron line managed for this
CODEXAPI_HOME.
Writer:
codexapi agent install-cron
Readers:
- User and installer commands.
Format:
- Plain text.
Why it exists:
- Makes scheduler installation inspectable and testable without reading the user's entire crontab.
The design is intentionally asymmetric.
- Any host may read any agent in the same
CODEXAPI_HOME. - Only the owner host may run the agent.
- Any host may enqueue command files in
commands/new/. - Only the owner host may mutate
state.json,AGENTBOOK.md, host runtime files, and run records.
Why this matters:
- It keeps cross-host writes minimal.
- It avoids shared append logs.
- It allows one shared registry across machines without letting an agent wake on the wrong host.
V1 uses exactly one cron entry per CODEXAPI_HOME and per host.
Cron cadence:
- every minute
Cron target:
CODEXAPI_HOME/bin/agent-tick
Why one scheduler entry:
- one place to reason about wake behavior
- no per-agent cron management
- easy recovery after reboot or sleep
Why cron:
- available on macOS and Linux
- no root requirement
- simple installation story
Each host uses a host-specific scheduler lock:
locks/.tick.<hostname>.lock
Locking rules:
- lock acquisition is non-blocking
- if the lock is held,
codexapi agent tickexits0immediately - missed scheduler invocations are dropped, not queued
The lock file itself may contain debug text such as pid and start time, but the authority is the kernel file lock, not file existence.
Why this matters:
- a long tick must not cause future ticks to pile up
- crash recovery is automatic because kernel locks are released when the process dies
Each agent has its own non-blocking run lock under its owner host directory.
Rules:
tick,send, and any future explicit wake path must all respect this lock- if the lock is held, the caller must not wait
- if new commands arrive while the agent is running, they stay queued for the next wake
Why this matters:
- one backend process per agent
- no concurrent
resumeon the same thread id
codexapi agent tick should:
- resolve
CODEXAPI_HOME - resolve the current hostname
- take the host-specific tick lock or exit
0 - scan all agents in this home
- ignore agents whose owner hostname does not match
- select agents that are due
- try each due agent with its non-blocking run lock
An agent is due when all of the following are true:
statusisreadyorerror- owner hostname matches the current hostname
- one of:
wake_requested_atis set- unread commands/messages exist
next_wake_atis present and in the past
Heartbeat behavior:
- missed heartbeat opportunities are dropped
- there is no replay of missed intervals after sleep or reboot
- the next heartbeat is scheduled from the time the current wake finishes, not from the last planned heartbeat slot
Why this matters:
- heartbeats are a chance to check in, not a durable queue
- durable user intent must live in command files, not in hypothetical missed ticks
Command files are the durable cross-host control plane.
Suggested command shape:
{
"id": "20260306T211500Z.host.pid.abcd",
"created_at": "2026-03-06T21:15:00Z",
"origin_hostname": "workstation-a",
"kind": "send",
"body": "Status?",
"author": "mark"
}Processing rules:
- owner host claims commands by rename from
new/toclaimed/ - commands are applied in timestamp order
pauseandcancelare applied before starting a new backend wakesendcontributes to the next prompt and increments unread counts until consumed;doneandcanceledagents still process queued messages as a one-off wakewakemeans run soon even if no heartbeat is dueresumereopens apausedordoneagent- after successful application, the owner host records the result in state or a run record and deletes the claimed file
Why command files instead of SSH:
- durable when the owner host is asleep or unreachable
- portable
- fewer assumptions about local network setup
Each wake is a fresh backend process.
Rules:
- do not keep a
codexprocess alive between heartbeats - when a wake starts, resume from
thread_idif present - when the wake ends, persist the updated
thread_id - if no
thread_idexists, start a fresh thread
Why this matters:
- robust to reboot and crash
- simpler process management
- clearer token accounting per wake
The backend thread id is useful memory, but not the source of truth. Durable
memory lives in the agent home, especially state.json, command files, and
AGENTBOOK.md.
The scheduler environment and the agent execution environment are not assumed to be the same.
Each agent should persist enough environment to resume sanely:
cwdPATHVIRTUAL_ENV, if set- interpreter path used to launch codexapi-related subprocesses when relevant
Why this matters:
- the cron-driven scheduler may run from a different venv than the one the user had active when the agent was created
- repo commands like
python,pytest, and tool wrappers often depend onPATHandVIRTUAL_ENV
V1 should store only the minimum needed to recreate the expected environment.
Managed wakes should also expose stable agent identity to the backend process:
CODEXAPI_AGENT_IDCODEXAPI_AGENT_NAMECODEXAPI_AGENT_PARENT_ID, when relevant
Why this matters:
- a managed agent should be able to start another agent without manually re-stating its own identity
- child agents should be able to record parentage automatically
V1 should not pretend to know dollar cost.
Track:
input_tokensoutput_tokenstotal_tokensavg_tokens_per_hour
Token totals belong in state.json so agent list can show them cheaply.
Why this matters:
- heartbeat-heavy agents can become unexpectedly expensive in quota terms
- users need a simple proxy for long-running agent cost
avg_tokens_per_hour is a lifetime running average in V1. More detailed recent
windows can be added later if needed.
V1 CLI surface:
codexapi agent startcodexapi agent listcodexapi agent whoamicodexapi agent readcodexapi agent bookcodexapi agent showcodexapi agent statuscodexapi agent sendcodexapi agent wakecodexapi agent pausecodexapi agent resumecodexapi agent cancelcodexapi agent deletecodexapi agent tickcodexapi agent install-cron
Expected behavior:
startcreates the agent directory, meta/state files, and host runtime fileslistreads only thisCODEXAPI_HOMEwhoamiprints the effective host identity andCODEXAPI_HOMEreadshows recent user-visible communication derived from state and run recordsbookprints the currentAGENTBOOK.mdtext for one agentshowreads one agent's current snapshot and recent run historysend,wake,pause,resume, andcancelcreate durable command filesdeleteremoves one agent directory when it is safe to do sotickprocesses due agents for the current hostname onlyinstall-croninstalls exactly one scheduler entry for this home on this host
Why command-oriented CLI actions:
- one path for local and cross-host control
- durable intent
- simpler concurrency model
V1 should explicitly recover from common failure modes.
- missed cron minutes are ignored
- the next cron minute runs
agent tick - due agents are selected from current state, not from queued heartbeat ticks
- kernel lock is released when the tick process dies
- next cron minute may run normally
- per-agent run lock is released when the process dies
- next tick sees the agent is not actually locked
- if
state.jsonstill saysrunning, reconcile it toerrororreadybefore proceeding
- other hosts may still inspect the agent and enqueue commands
- commands remain durable until the owner host comes back
The main automated testing tool is a temporary CODEXAPI_HOME.
Why this is the right testing seam:
- it isolates tests from live agents
- it allows end-to-end command and tick tests without real cron
- it matches the real control-plane boundary
- every integration test sets
CODEXAPI_HOMEto a temp directory - tests call CLI commands or internal functions directly
- tests run
codexapi agent tickdirectly instead of invoking cron - tests must never depend on the default
~/.codexapi
Unit tests:
- due-agent selection
- heartbeat scheduling
- token accounting
- path resolution
- command parsing
- state transition logic
Filesystem integration tests:
- create an agent
- enqueue command files
- run
tick - verify command consumption, state updates, and next wake times
- verify that different
CODEXAPI_HOMEroots are fully isolated
Backend-stub integration tests:
- replace real backend execution with a fake runner
- return canned outputs and thread ids
- verify prompt construction, session resume, and token accounting
Scheduler tests:
- verify wrapper script generation
- verify cron line rendering
- verify that two different
CODEXAPI_HOMEroots on one host produce separate scheduler artifacts - do not touch a real user crontab in normal automated tests
Cross-host tests:
- fake different hostnames
- verify that only the owner hostname wakes an agent
- verify that non-owner hosts can still enqueue commands
Locking tests:
- simulate tick lock contention and assert fast
0exit - simulate per-agent run lock contention and assert no second wake starts
These are the rules the implementation should preserve.
CODEXAPI_HOMEis a complete isolated control plane.- One cron installation belongs to one home on one host.
- Only the owner hostname may wake an agent.
- Heartbeat opportunities are lossy.
- Commands are durable.
- No caller waits on the tick lock or per-agent run lock.
- There is never more than one live backend process per agent.
- The backend thread id is useful state, not the source of truth.
- Cross-host writes use one-file command spooling, not shared append logs.
- Agent state shown in the CLI comes from
state.json, not from expensive live reconstruction.
This design deliberately avoids many attractive additions.
- It does not require a daemon.
- It does not require host-to-host RPC.
- It does not require richer distributed locking than the filesystem already provides.
- It does not require a database.
What remains is the minimum necessary structure for a durable, inspectable, multi-day agent system:
- one state root
- one scheduler entry
- one agent directory per agent
- one command spool per agent
- one host owner per agent
- one run at a time
That is a good V1 shape.