codexapi agent V1

Purpose

codexapi agent is a long-term fire-and-forget orchestration layer built on top of the existing agent, task, science, and lead primitives.

The V1 goal is not to invent a new kind of coding agent. The goal is to make a durable agent that can:

keep working for days
survive sleep, reboot, and missed scheduler runs
be inspected and controlled from the CLI
accept messages while it is running
delegate coding work to codexapi task or codexapi science
escalate to the user when needed

The design is intentionally simple. It uses durable filesystem state plus one periodic scheduler entry per CODEXAPI_HOME.

Non-Goals

V1 does not try to solve everything.

No daemon is required.
No SSH is required.
No cross-host migration or "teleportation" of running agents.
No separate task-agent and watcher-agent runtimes.
No catch-up replay of missed heartbeat ticks.
No dependence on real cron in automated tests.
No shared append-only logs written by multiple hosts.

These are deliberate omissions. They keep the system small, portable, and easy to reason about.

Top-Level Model

An agent is a durable record plus a periodic wake mechanism.

There is one agent type.
Each agent has a stop_policy.
Each agent belongs to exactly one CODEXAPI_HOME.
Each agent is owned by exactly one hostname.
Only the owning hostname may wake and run the agent.
Any host that can see the shared filesystem may inspect the agent and queue commands for it.

The agent's durable truth is the state stored under CODEXAPI_HOME, not a live backend process. Each wake starts a fresh backend process and resumes from the saved thread id when available.

`CODEXAPI_HOME`

CODEXAPI_HOME is the root of a complete agent control plane.

Default:

~/.codexapi

Override:

CODEXAPI_HOME=/path/to/home

Why this exists:

It isolates live state from tests.
It allows multiple independent codexapi installations on one machine.
It allows a shared filesystem setup without forcing all state into one global namespace.

Two different CODEXAPI_HOME values are two different systems. They do not see each other's agents, locks, scheduler wrappers, or cron entries.

`CODEXAPI_HOSTNAME`

CODEXAPI_HOSTNAME overrides the host identity used for agent ownership and host-specific locks.

Default:

use the process hostname reported by the OS

Override:

CODEXAPI_HOSTNAME=stable-hostname

Why this exists:

Some shells, cron environments, test harnesses, or sandboxes report different hostnames for the same machine.
Agent ownership depends on an exact hostname match.
Tests and sandboxed runs need a stable explicit value.

Agent Model

Each agent stores at least:

id: stable identifier
name: human-readable unique name within the home
created_at: UTC timestamp
created_by: user name or parent agent name
parent_id: parent agent id, if any
hostname: owning host for execution
cwd: working directory
prompt: original instruction text
stop_policy: until_done or until_stopped
status: current lifecycle state
thread_id: backend resume id, or empty
heartbeat_minutes: heartbeat interval
last_wake_at: last attempted wake time in UTC
last_success_at: last completed wake time in UTC
next_wake_at: next heartbeat due time in UTC
wake_requested_at: durable "run soon" flag for queued commands/messages
unread_message_count: messages not yet folded into a wake
input_tokens
output_tokens
total_tokens
avg_tokens_per_hour
child_ids
last_error: most recent failure summary, if any
activity: short status text for agent list

V1 uses one agent type with one explicit lifecycle hint:

stop_policy=until_done: agent is expected to decide when it is finished
stop_policy=until_stopped: agent is expected to keep running until stopped

This keeps the runtime unified while preserving a small but important semantic difference for scheduling and UI.

Lifecycle States

V1 keeps the state model small:

ready: can be woken when due
running: a wake is currently in progress
paused: do not wake until resumed
done: completed by the agent's own judgment
canceled: stopped by an explicit command
error: last wake failed and the agent needs attention or another wake

Why these states:

ready and running are enough for normal operation
paused, done, and canceled are user-visible terminal or semi-terminal control states
error makes failures explicit without inventing a richer failure taxonomy

Filesystem Layout

All paths below are relative to CODEXAPI_HOME.

agents/
  <agent_id>/
    meta.json
    state.json
    AGENTBOOK.md
    commands/
      new/
      claimed/
    hosts/
      <hostname>/
        session.json
        run.lock
        runs/
locks/
  .tick.<hostname>.lock
bin/
  agent-tick
cron/
  agent.cron

`agents/<agent_id>/meta.json`

Purpose:

Stable identity and configuration.

Writer:

Owner host only after agent creation, except for explicit configuration changes.

Readers:

Any host.

Format:

JSON object.

Why it exists:

Separates mostly-static configuration from rapidly changing state.

Suggested contents:

id, name, created_at, created_by, parent_id, hostname, cwd, prompt, stop_policy, heartbeat_minutes

`agents/<agent_id>/state.json`

Purpose:

Current snapshot for CLI inspection.

Writer:

Owner host only.

Readers:

Any host.

Format:

JSON object rewritten atomically with temp file + rename.

Why it exists:

agent list and agent show should not need to reconstruct state from many files or logs.

Suggested contents:

status, thread_id, last_wake_at, last_success_at, next_wake_at, wake_requested_at, unread_message_count, token totals, activity, last_error, child_ids

`agents/<agent_id>/AGENTBOOK.md`

Purpose:

Human-readable working memory for the agent, similar to the leadbook.

Writer:

Owner host only.

Readers:

Any host.

Format:

Markdown.

Why it exists:

Thread ids are not sufficient durable memory. The book is the portable, inspectable memory surface.

Suggested shape:

A stable header with the agent's purpose, values, original goal, and standing guidance.
A dated working-notes section where the agent updates its current plan, active tasks, unexpected developments, wider frame, curiosities, risks, and next move.

Wake behavior:

The wake prompt should preserve the stable header and the latest notes, rather than repeatedly truncating from the top of the file and hiding recent state.

`agents/<agent_id>/commands/new/`

Purpose:

Durable cross-host command spool.

Writer:

Any host may create new files here.

Readers:

Owner host only for processing, any host for debugging.

Format:

One JSON file per command.

Why it exists:

It avoids shared append logs and avoids requiring SSH or direct host reachability.

Filename rule:

<utc>.<origin-host>.<pid>.<random>.json

Writers must:

write to a temp file in the same directory tree
fsync if practical
rename atomically into commands/new/

Supported V1 commands:

send
wake
pause
resume
cancel

`agents/<agent_id>/commands/claimed/`

Purpose:

Temporary processing area for commands taken by the owner host.

Writer:

Owner host only.

Readers:

Mainly owner host; other hosts may inspect for debugging.

Format:

Same JSON command files, moved from new/.

Why it exists:

Claim-by-rename is simple, durable, and avoids double processing.

After a claimed command is applied, the owner host should record the outcome in state.json or a run record and then remove the command file. The command file is transport, not long-term audit storage.

`agents/<agent_id>/hosts/<hostname>/session.json`

Purpose:

Host-local runtime data for the owner host.

Writer:

Owner host only.

Readers:

Mostly owner host.

Format:

JSON object.

Why it exists:

Keeps the liveliest mutable runtime fields under a host-specific path.

Suggested contents:

thread_id
environment snapshot used for execution
last run metadata that does not need to be duplicated in state.json

`agents/<agent_id>/hosts/<hostname>/run.lock`

Purpose:

Non-blocking per-agent run lock.

Writer:

Owner host only.

Readers:

Owner host only in normal operation.

Format:

Permanent lock file used with flock or fcntl.

Why it exists:

Prevents two entry points from resuming the same backend thread at the same time.

`agents/<agent_id>/hosts/<hostname>/runs/`

Purpose:

Per-wake run records for debugging and recovery.

Writer:

Owner host only.

Readers:

Any host.

Format:

One JSON file per wake.

Why it exists:

Per-run files are easier to inspect and safer than multi-host append logs.

Suggested contents:

start and end times
reason for wake
commands consumed
agent reply text or status payload intended for the CLI
token deltas
result summary
error details if any

`bin/agent-tick`

Purpose:

Stable wrapper script for cron.

Writer:

codexapi agent install-cron

Readers:

Cron and the user.

Format:

Executable shell script.

Why it exists:

Cron has a sparse environment. The wrapper pins the interpreter and exports a safe environment.

The wrapper should:

export the resolved CODEXAPI_HOME
export the resolved CODEXAPI_HOSTNAME
set a safe PATH
invoke the exact Python interpreter or installed codexapi path discovered at install time

`cron/agent.cron`

Purpose:

Record of the cron line managed for this CODEXAPI_HOME.

Writer:

codexapi agent install-cron

Readers:

User and installer commands.

Format:

Plain text.

Why it exists:

Makes scheduler installation inspectable and testable without reading the user's entire crontab.

Ownership Rules

The design is intentionally asymmetric.

Any host may read any agent in the same CODEXAPI_HOME.
Only the owner host may run the agent.
Any host may enqueue command files in commands/new/.
Only the owner host may mutate state.json, AGENTBOOK.md, host runtime files, and run records.

Why this matters:

It keeps cross-host writes minimal.
It avoids shared append logs.
It allows one shared registry across machines without letting an agent wake on the wrong host.

Scheduler

V1 uses exactly one cron entry per CODEXAPI_HOME and per host.

Cron cadence:

every minute

Cron target:

CODEXAPI_HOME/bin/agent-tick

Why one scheduler entry:

one place to reason about wake behavior
no per-agent cron management
easy recovery after reboot or sleep

Why cron:

available on macOS and Linux
no root requirement
simple installation story

Tick Lock

Each host uses a host-specific scheduler lock:

locks/.tick.<hostname>.lock

Locking rules:

lock acquisition is non-blocking
if the lock is held, codexapi agent tick exits 0 immediately
missed scheduler invocations are dropped, not queued

The lock file itself may contain debug text such as pid and start time, but the authority is the kernel file lock, not file existence.

Why this matters:

a long tick must not cause future ticks to pile up
crash recovery is automatic because kernel locks are released when the process dies

Per-Agent Run Lock

Each agent has its own non-blocking run lock under its owner host directory.

Rules:

tick, send, and any future explicit wake path must all respect this lock
if the lock is held, the caller must not wait
if new commands arrive while the agent is running, they stay queued for the next wake

Why this matters:

one backend process per agent
no concurrent resume on the same thread id

Tick Semantics

codexapi agent tick should:

resolve CODEXAPI_HOME
resolve the current hostname
take the host-specific tick lock or exit 0
scan all agents in this home
ignore agents whose owner hostname does not match
select agents that are due
try each due agent with its non-blocking run lock

An agent is due when all of the following are true:

status is ready or error
owner hostname matches the current hostname
one of:
- wake_requested_at is set
- unread commands/messages exist
- next_wake_at is present and in the past

Heartbeat behavior:

missed heartbeat opportunities are dropped
there is no replay of missed intervals after sleep or reboot
the next heartbeat is scheduled from the time the current wake finishes, not from the last planned heartbeat slot

Why this matters:

heartbeats are a chance to check in, not a durable queue
durable user intent must live in command files, not in hypothetical missed ticks

Command Processing

Command files are the durable cross-host control plane.

Suggested command shape:

{
  "id": "20260306T211500Z.host.pid.abcd",
  "created_at": "2026-03-06T21:15:00Z",
  "origin_hostname": "workstation-a",
  "kind": "send",
  "body": "Status?",
  "author": "mark"
}

Processing rules:

owner host claims commands by rename from new/ to claimed/
commands are applied in timestamp order
pause and cancel are applied before starting a new backend wake
send contributes to the next prompt and increments unread counts until consumed; done and canceled agents still process queued messages as a one-off wake
wake means run soon even if no heartbeat is due
resume reopens a paused or done agent
after successful application, the owner host records the result in state or a run record and deletes the claimed file

Why command files instead of SSH:

durable when the owner host is asleep or unreachable
portable
fewer assumptions about local network setup

Wakes and Backend Process Model

Each wake is a fresh backend process.

Rules:

do not keep a codex process alive between heartbeats
when a wake starts, resume from thread_id if present
when the wake ends, persist the updated thread_id
if no thread_id exists, start a fresh thread

Why this matters:

robust to reboot and crash
simpler process management
clearer token accounting per wake

The backend thread id is useful memory, but not the source of truth. Durable memory lives in the agent home, especially state.json, command files, and AGENTBOOK.md.

Environment Handling

The scheduler environment and the agent execution environment are not assumed to be the same.

Each agent should persist enough environment to resume sanely:

cwd
PATH
VIRTUAL_ENV, if set
interpreter path used to launch codexapi-related subprocesses when relevant

Why this matters:

the cron-driven scheduler may run from a different venv than the one the user had active when the agent was created
repo commands like python, pytest, and tool wrappers often depend on PATH and VIRTUAL_ENV

V1 should store only the minimum needed to recreate the expected environment.

Managed wakes should also expose stable agent identity to the backend process:

CODEXAPI_AGENT_ID
CODEXAPI_AGENT_NAME
CODEXAPI_AGENT_PARENT_ID, when relevant

Why this matters:

a managed agent should be able to start another agent without manually re-stating its own identity
child agents should be able to record parentage automatically

Token Accounting

V1 should not pretend to know dollar cost.

Track:

input_tokens
output_tokens
total_tokens
avg_tokens_per_hour

Token totals belong in state.json so agent list can show them cheaply.

Why this matters:

heartbeat-heavy agents can become unexpectedly expensive in quota terms
users need a simple proxy for long-running agent cost

avg_tokens_per_hour is a lifetime running average in V1. More detailed recent windows can be added later if needed.

CLI Contract

V1 CLI surface:

codexapi agent start
codexapi agent list
codexapi agent whoami
codexapi agent read
codexapi agent book
codexapi agent show
codexapi agent status
codexapi agent send
codexapi agent wake
codexapi agent pause
codexapi agent resume
codexapi agent cancel
codexapi agent delete
codexapi agent tick
codexapi agent install-cron

Expected behavior:

start creates the agent directory, meta/state files, and host runtime files
list reads only this CODEXAPI_HOME
whoami prints the effective host identity and CODEXAPI_HOME
read shows recent user-visible communication derived from state and run records
book prints the current AGENTBOOK.md text for one agent
show reads one agent's current snapshot and recent run history
send, wake, pause, resume, and cancel create durable command files
delete removes one agent directory when it is safe to do so
tick processes due agents for the current hostname only
install-cron installs exactly one scheduler entry for this home on this host

Why command-oriented CLI actions:

one path for local and cross-host control
durable intent
simpler concurrency model

Failure Recovery

V1 should explicitly recover from common failure modes.

Reboot or Sleep

missed cron minutes are ignored
the next cron minute runs agent tick
due agents are selected from current state, not from queued heartbeat ticks

Tick Crash

kernel lock is released when the tick process dies
next cron minute may run normally

Wake Crash

per-agent run lock is released when the process dies
next tick sees the agent is not actually locked
if state.json still says running, reconcile it to error or ready before proceeding

Owner Host Unavailable

other hosts may still inspect the agent and enqueue commands
commands remain durable until the owner host comes back

Testing Strategy

The main automated testing tool is a temporary CODEXAPI_HOME.

Why this is the right testing seam:

it isolates tests from live agents
it allows end-to-end command and tick tests without real cron
it matches the real control-plane boundary

Test Rules

every integration test sets CODEXAPI_HOME to a temp directory
tests call CLI commands or internal functions directly
tests run codexapi agent tick directly instead of invoking cron
tests must never depend on the default ~/.codexapi

Test Layers

Unit tests:

due-agent selection
heartbeat scheduling
token accounting
path resolution
command parsing
state transition logic

Filesystem integration tests:

create an agent
enqueue command files
run tick
verify command consumption, state updates, and next wake times
verify that different CODEXAPI_HOME roots are fully isolated

Backend-stub integration tests:

replace real backend execution with a fake runner
return canned outputs and thread ids
verify prompt construction, session resume, and token accounting

Scheduler tests:

verify wrapper script generation
verify cron line rendering
verify that two different CODEXAPI_HOME roots on one host produce separate scheduler artifacts
do not touch a real user crontab in normal automated tests

Cross-host tests:

fake different hostnames
verify that only the owner hostname wakes an agent
verify that non-owner hosts can still enqueue commands

Locking tests:

simulate tick lock contention and assert fast 0 exit
simulate per-agent run lock contention and assert no second wake starts

Invariants

These are the rules the implementation should preserve.

CODEXAPI_HOME is a complete isolated control plane.
One cron installation belongs to one home on one host.
Only the owner hostname may wake an agent.
Heartbeat opportunities are lossy.
Commands are durable.
No caller waits on the tick lock or per-agent run lock.
There is never more than one live backend process per agent.
The backend thread id is useful state, not the source of truth.
Cross-host writes use one-file command spooling, not shared append logs.
Agent state shown in the CLI comes from state.json, not from expensive live reconstruction.

Why This Design Is Small Enough

This design deliberately avoids many attractive additions.

It does not require a daemon.
It does not require host-to-host RPC.
It does not require richer distributed locking than the filesystem already provides.
It does not require a database.

What remains is the minimum necessary structure for a durable, inspectable, multi-day agent system:

one state root
one scheduler entry
one agent directory per agent
one command spool per agent
one host owner per agent
one run at a time

That is a good V1 shape.

FilesExpand file tree

agent-v1.md

Latest commit

History

agent-v1.md

File metadata and controls

codexapi agent V1

Purpose

Non-Goals

Top-Level Model

CODEXAPI_HOME

CODEXAPI_HOSTNAME

Agent Model

Lifecycle States

Filesystem Layout

agents/<agent_id>/meta.json

agents/<agent_id>/state.json

agents/<agent_id>/AGENTBOOK.md

agents/<agent_id>/commands/new/

agents/<agent_id>/commands/claimed/

agents/<agent_id>/hosts/<hostname>/session.json

agents/<agent_id>/hosts/<hostname>/run.lock

agents/<agent_id>/hosts/<hostname>/runs/

bin/agent-tick

cron/agent.cron

Ownership Rules

Scheduler

Tick Lock

Per-Agent Run Lock

Tick Semantics

Command Processing

Wakes and Backend Process Model

Environment Handling

Token Accounting

CLI Contract

Failure Recovery

Reboot or Sleep

Tick Crash

Wake Crash

Owner Host Unavailable

Testing Strategy

Test Rules

Test Layers

Invariants

Why This Design Is Small Enough

`CODEXAPI_HOME`

`CODEXAPI_HOSTNAME`

`agents/<agent_id>/meta.json`

`agents/<agent_id>/state.json`

`agents/<agent_id>/AGENTBOOK.md`

`agents/<agent_id>/commands/new/`

`agents/<agent_id>/commands/claimed/`

`agents/<agent_id>/hosts/<hostname>/session.json`

`agents/<agent_id>/hosts/<hostname>/run.lock`

`agents/<agent_id>/hosts/<hostname>/runs/`

`bin/agent-tick`

`cron/agent.cron`