Spawnd is a deployed agent orchestration system. Postgres is the durable system of record, Redis is coordination, and object storage holds redacted artifacts. Workers use git worktrees and files only as execution scratch; durable evidence is stored as database rows plus artifact objects.
- Postgres stores runs, agents, attempts, leases, events, responses, runtime sessions, invocations, provider events, traces, artifacts, checks, usage, and git provenance.
- Redis carries ready-agent wakeups, worker heartbeats, lease hints, live run events, and cancellation signals. Redis can be rebuilt from Postgres.
- Object storage stores redacted setup output, runtime output, final messages, check output, patches, and provider payloads.
- OpenTelemetry can export spans externally while Postgres keeps a redacted queryable trace mirror.
- CLI, Python API, and HTTP API submit to and read from the same deployed services.
Set the deployed backend environment:
export SPAWND_DATABASE_URL='postgresql+psycopg://user:pass@host:5432/spawnd'
export SPAWND_REDIS_URL='redis://host:6379/0'
export SPAWND_ARTIFACTS_BUCKET='spawnd-artifacts'
export SPAWND_ARTIFACTS_ENDPOINT='https://s3.example.com'
export SPAWND_ARTIFACTS_REGION='us-east-1'
export SPAWND_ARTIFACTS_PREFIX='prod'
export SPAWND_API_TOKEN='change-me'Telemetry uses standard OTEL_* variables plus plan or env settings:
orchestration:
telemetry:
enabled: true
exporter: otlp
capture: full
failure_policy: degrade
artifacts:
capture_raw: falseThe compose worker exports OTLP traces to the bundled collector at
http://otel-collector:4318; the collector health endpoint is exposed locally
at http://localhost:13133.
Raw artifact capture is off by default. Runtime output, final messages, setup output, checks, and provider payloads are redacted before upload unless the plan explicitly enables raw capture.
Use Alembic for deployed schema creation:
alembic upgrade headThe greenfield initial migration creates the complete deployed schema: materialized run and agent state, normalized runtime/provider tables, artifact indexes, verification checks, trace mirrors, git provenance, worker nodes, and queue outbox.
Submit a plan file:
spawnd run -f plan.yaml --source-repo "$PWD" --source-ref origin/mainspawnd submit -f plan.yaml is an alias for the same deployed submit path.
Inline runs are supported:
spawnd run -p "Improve the parser error message" --check "pytest tests/test_parser.py"Python clients use the same backend:
from spawnd import agent, run
run(
[agent("parser", "Improve parser diagnostics", check="pytest tests/test_parser.py")],
repository=repo,
coordinator=coordinator,
source_repo="/repo",
source_ref="origin/main",
)source_repo is a local git repository path that workers can read. The CLI
defaults it to the submitter's current directory when omitted. source_ref is
the default base ref for worker-created worktrees; a plan-level
orchestration.worktree_source.base_ref intentionally overrides it.
orchestration:
worktree_source:
base_ref: origin/main
fetch: true
env_refs:
GIT_ASKPASS: SPAWND_GIT_ASKPASS
worktree_setup:
command: bash scripts/worktree/setup.sh
cache: true
cache_paths:
- pnpm-lock.yaml
command_policy:
mode: allowlist
cleanup:
worktree: true
concurrency_limit: 2source_repo may be a local git repository path or a git remote URL. Remote
sources are cloned/fetched into the worker source cache under
SPAWND_SOURCE_CACHE_ROOT or $SPAWND_SCRATCH_ROOT/sources. Worker worktrees
remain scratch; durable evidence is still Postgres rows plus object-store
artifacts.
Git clone/fetch/push credentials should be provided through explicit
env_refs on orchestration.worktree_source or orchestration.git. The plan
stores only the reference names; workers resolve actual secret values from their
environment at runtime.
The container worker loads SPAWND_GITHUB_TOKEN from the ignored .env and uses
SPAWND_GIT_ASKPASS=/usr/local/bin/spawnd-git-askpass for this purpose. It
seeds a writable worker Codex home volume from the required
SPAWND_CODEX_AUTH_DIR/auth.json, then mounts that volume at /root/.codex so
Codex can create its local state database without mutating the seed directory.
On Podman hosts such as micro-1, start the deployed stack with:
cp .env.example .env
$EDITOR .env
deploy/podman/up.shPlan-provided setup and check commands are constrained by
orchestration.command_policy. The default allowlist permits common test and
package-manager commands and rejects shell control syntax such as pipes,
semicolons, redirects, and command substitution. Use mode: unrestricted only
for trusted internal templates.
When worktree_setup.cache is true, workers compute a secret-free cache key
from the setup command, source HEAD, and configured lockfiles, create a cache
directory, and pass SPAWND_SETUP_CACHE_KEY plus SPAWND_SETUP_CACHE_DIR to
the setup command. Workers also point common package-manager caches inside that
directory with npm_config_cache, npm_config_store_dir, YARN_CACHE_FOLDER,
BUN_INSTALL_CACHE_DIR, UV_CACHE_DIR, PIP_CACHE_DIR, and
POETRY_CACHE_DIR.
orchestration.cleanup.worktree: true removes worker scratch worktrees after
terminal agent execution. Provenance, patches, checks, logs, and pushed branch
state remain durable in Postgres, object storage, and git.
orchestration.concurrency_limit is enforced at the Postgres claim boundary
for a run, so multiple worker processes cannot exceed the configured number of
simultaneously running agents.
Reviewer/read-only agents can be made non-mutating:
agents:
- name: reviewer
use_role: reviewer
write_allowed: false
prompt: Review the current diff and report risks.The built-in reviewer role defaults to write_allowed: false; explicit agent
configuration can override it.
Codex agents can set runtime safety policy per agent instead of relying on process-wide env defaults:
agents:
- name: codex-worker
runtime: codex
codex:
engine: cli
sandbox: workspace-write
approval_mode: deny_all
ephemeral: true
prompt: Make the requested change.Codex SDK runs estimate cost from exposed token counts and enforce
max_cost_usd. Codex CLI runs with --json, records subprocess facts, extracts
session/token facts when the CLI emits them, and estimates cost from those token
counts. When a stored Codex CLI session id exists, retries use
codex exec resume <session-id>.
Claude, OpenAI, and Codex CLI agents can receive external MCP servers from the plan. Secret material must be passed by reference and resolved from the worker environment:
defaults:
runtime: claude
mcp_servers:
- name: docs
type: http
url: https://mcp.example.com
header_refs:
Authorization: SPAWND_MCP_DOCS_AUTHORIZATION
tools:
- searchConfigured MCP servers are recorded in runtime_mcp_servers with a config hash.
Codex supports stdio MCP servers and streamable HTTP servers with bearer-token
environment references. Codex rejects unsupported MCP shapes, such as SSE,
literal HTTP headers, and non-Authorization header refs, at plan validation.
Codex manager agents use the CLI engine so they can access Spawnd coordination
tools through the internal spawnd MCP server.
Run a single claim:
spawnd worker --onceRun a polling worker:
spawnd worker --pollWrite-capable real provider runtimes require an explicit worker isolation
boundary. Set SPAWND_RUNTIME_ISOLATION=container, jail, or vm on deployed
workers after placing them in that boundary. Mock runs and readonly agents do
not require it. Without this marker, the worker fails the agent before setup or
provider execution instead of running write/edit tools on an unisolated host.
Recover queue hints from Postgres:
spawnd reconcileRecord a heartbeat:
spawnd worker-heartbeatWorkers can serve runs for any submitted local source_repo they can read.
--source-path /repo is optional and is only the fallback source when a run has
no source_repo.
Workers claim through a Postgres transaction, move the run to running, renew
lease state while executing, write artifacts and provenance, and enqueue newly
ready dependents through the queue outbox plus Redis wakeups. Redis entries are
wakeups only: reconciliation rebuilds missing hints from Postgres and records
outbox rows before publishing.
When a provider exposes a resumable session id, workers persist it on the
runtime session. Claude retries pass the latest prior session id back through
the SDK resume option. OpenAI retries reuse the latest server-managed
conversation id through the Agents SDK conversation_id option. Codex CLI
retries resume with codex exec resume when a stored session id exists; Codex
SDK retries use thread_resume or resume_thread when the installed SDK exposes
one, and fail clearly rather than silently cold-starting if it does not.
Manager agents can spawn dynamic workers. spawn_worker creates a real queued
agent row, updates the durable run spec used by workers, records a queue outbox
row, and publishes a Redis wakeup when Redis is available. Spawned agents are
still claimed and executed through the same Postgres worker path. Claude and
OpenAI managers receive coordination tools through their SDK tool interfaces;
Codex managers receive the same tools through the internal Spawnd MCP server.
If the submitted source path is missing or is not a git repository, the worker
fails the claimed agent with a redacted source-error artifact and a
runtime_errors(source='worktree_source') row instead of leaving the agent
running.
spawnd status <run-id>
spawnd events <run-id>
spawnd live-events <run-id>
spawnd artifacts <run-id>
spawnd logs <run-id>
spawnd checks <run-id>
spawnd trace <run-id>
spawnd provenance <run-id>spawnd logs reads redacted runtime output and final messages from artifact
storage; it does not read worker filesystem state.
Control commands:
spawnd cancel <run-id>
spawnd resume <run-id>
spawnd pr create <run-id> --agent parser
spawnd pr merge <run-id> --agent parser --method squashServe the API:
spawnd serve --host 0.0.0.0 --port 8765Endpoints:
GET /healthzGET /readyzGET /metricsPOST /runsGET /runs/{run_id}GET /runs/{run_id}/eventsGET /runs/{run_id}/checksGET /runs/{run_id}/artifactsGET /runs/{run_id}/tracesGET /runs/{run_id}/provenancePOST /runs/{run_id}/cancelPOST /runs/{run_id}/resumePOST /templatesGET /templatesPOST /templates/{template_id}/runsPOST /schedulesPATCH /schedules/{schedule_id}/statusPOST /schedules/run-duePOST /submissionsPOST /submissions/drainPOST /webhooks/github/{template_id}POST /workers/reconcilePOST /workers/outbox/drainGET /workers
POST /runs accepts a serialized plan body. It does not read server-local plan
files.
{
"run_id": "optional-run-id",
"plan": {
"name": "example",
"agents": [
{"name": "parser", "prompt": "Improve parser diagnostics"}
]
},
"source_repo": "/repo",
"source_ref": "origin/main"
}The server validates the plan at the HTTP boundary and rejects unknown request fields.
All HTTP routes except health, readiness, metrics, and GitHub webhooks require
Authorization: Bearer $SPAWND_API_TOKEN. GitHub webhooks verify
X-Hub-Signature-256 against SPAWND_GITHUB_WEBHOOK_SECRET. GitHub ping
events return {"status":"pong"} and do not submit runs.
Install repository webhooks with a durable public API base URL:
export SPAWND_GITHUB_WEBHOOK_SECRET='same-secret-as-api'
spawnd github-webhooks install \
--base-url https://spawnd.example.com \
--repo fntune/subport \
--repo fntune/stockbay \
--repo fntune/fn \
--repo fntune/biomon \
--repo fntune/cashgrepVerify installed hooks with the same repo list:
spawnd github-webhooks verify \
--base-url https://spawnd.example.com \
--repo fntune/subport \
--repo fntune/stockbay \
--repo fntune/fn \
--repo fntune/biomon \
--repo fntune/cashgrepReusable templates and schedules are durable Postgres records:
spawnd templates put contributor -f deploy/templates/contributor.yaml \
--source-repo-template '{clone_url}' \
--source-ref-template '{after}'
spawnd templates run contributor \
--param clone_url=https://github.com/acme/app.git \
--param after=main \
--param repo=acme/app \
--param repo_slug=acme-app \
--param event=manual \
--param action= \
--param ref=main \
--param before= \
--param pr_number= \
--param head_ref= \
--param head_sha= \
--param base_ref= \
--param base_sha=
spawnd schedules put nightly --template-id contributor --interval-seconds 86400
spawnd schedules run-due --poll --idle-sleep-seconds 60
spawnd submit-queue enqueue-template contributor \
--param clone_url=https://github.com/acme/app.git \
--param after=main \
--param repo=acme/app \
--param repo_slug=acme-app \
--param event=manual \
--param action= \
--param ref=main \
--param before= \
--param pr_number= \
--param head_ref= \
--param head_sha= \
--param base_ref= \
--param base_sha=
spawnd submit-queue drain --onceExternal systems can enqueue JSON messages onto the Redis submission stream
through POST /submissions or directly to Redis. spawnd submit-queue drain --poll consumes those messages, validates them, and creates canonical Postgres
runs.
Run spawnd drain-outbox --poll --idle-sleep-seconds 5 as a separate service
when you want an independent outbox relay in addition to the worker's poll-loop
drain.
Notifications use backend environment configuration, not raw secrets in plans.
Set SPAWND_NOTIFICATION_WEBHOOK_URL; failed, timed-out, cancelled, and
cost-exceeded runs notify automatically. Completed runs notify when the plan has
on_complete: notify.
The repository includes a reusable unattended contributor template at
deploy/templates/contributor.yaml. It is intentionally conservative: Codex
runs inside the worker container boundary, the durable worker check is
git diff --check, raw artifacts are disabled, git push uses secret references,
and recurring schedules should be created paused until intentionally activated.
See docs/deployment.md for Podman, compose, migration, and production environment details.
Install with deployed extras:
pip install -e '.[dev,deployed,telemetry,codex,sdk,openai]'State-transition integration tests require a Postgres test database:
export SPAWND_TEST_DATABASE_URL='postgresql+psycopg://user:pass@localhost:5432/spawnd_test'
pytestWithout SPAWND_TEST_DATABASE_URL, Postgres integration tests are skipped and
unit tests still run.
Useful local verification commands:
python -m compileall -q spawnd tests
git diff --check