Architecture note for the foundation layer of a broader open ecosystem platform for strong agents. This document focuses on the currently implemented foundation slice: controllers, operators, A2A, runtime, graph execution, and skill loading.
AgentWorld is the foundation repository for a broader open ecosystem platform for strong agents.
It is not another agent framework in the older sense of "wrapping an LLM SDK with prompts and tools."
The broader platform direction includes:
- a common infrastructure layer for controlling and coordinating strong agents
- a reusable skill platform
- benchmark and evaluation layers
- application ecosystems such as auto research, reliable agentic systems, and domain workflows
- community collaboration and future exchange layers
This document focuses on the first part: the platform foundation.
It is aimed at a different problem:
- The execution primitive is no longer a single LLM API call
- The execution primitive is a strong agent such as
Claude Code,Codex, orOpenClaw, with tool use, sessions, filesystem access, and long-running execution - The framework itself does not think on behalf of those agents; instead, it is responsible for:
- controlling them through a uniform upper-layer interface
- scheduling them
- organizing collaboration between them
- maintaining shared multi-agent state
- letting them run reliably inside a graph-based workflow
- establishing the infrastructure base that higher platform layers can build on
In one sentence, this repository is building the platform foundation for strong agents, not another LLM invocation wrapper.
The previous generation of frameworks usually centered around:
- LLM
- Prompt
- Tool
- Memory
- Chain / AgentExecutor
Those frameworks typically assumed:
- the agent itself is fundamentally a model-call loop
- tool calls are just part of model output
- session state is primarily managed by the framework itself
In this generation, the core objects should become:
Controller: how to actually control a specific strong agentOperator: the execution unit that exposes a uniform behavior to upper layersA2A Protocol: how agents exchange messages, tool results, artifacts, and control signalsGraph: the collaboration graph across multiple agentsRuntime: scheduling, state merging, checkpoint, resume, interrupt, and trace
That means the smallest meaningful unit is no longer "a model call", but "a durable agent operator".
The goals of this foundation repository are straightforward:
- Define one unified operator interface for strong agents
- Push provider-specific control detail down into each controller
- Provide LangGraph-like graph orchestration as one core foundation subsystem where nodes are strong-agent operators
- Define an internal A2A protocol so agent communication is not just arbitrary text concatenation
- Make runs recoverable, traceable, interruptible, and replayable
- Establish the base layer that future skill, benchmark, and application systems can build on
At this stage, the foundation repository is explicitly not trying to do the following:
- reimplement the intelligence inside Claude Code / Codex / OpenClaw
- force every agent into one prompt template
- ship the whole ecosystem platform surface from day one
- support every provider and every runtime environment from day one
- turn the framework into a giant all-in-one system
Phase one only targets the minimum critical loop:
unified control -> graph scheduling -> state management -> multi-agent collaboration -> recoverable execution
Higher platform layers should be able to grow on top of that loop later.
The framework assumes the lower layer is already a strong agent, not a bare model interface.
Different strong agents vary dramatically, for example:
- session creation differs
- session recovery differs
- tool permission configuration differs
- streaming output format differs
- filesystem and workspace constraints differ
Those differences should not leak into the graph layer. The controller should absorb them.
To the graph and runtime, it should not matter whether the underlying provider is Claude Code or Codex. The graph only needs to know:
- which operator this node invokes
- what input the operator requires
- what normalized result it returns
- how state and routing should be updated afterward
Shared state and agent-to-agent messages are different things and must be designed separately:
State: graph-level authoritative stateA2A Message: the communication carrier between agents
The system should not rely on plain natural-language context stitching, and it should not stuff everything into global state.
One of LangGraph's best ideas is:
- graph construction only describes structure
- execution happens in a compiled runtime
That separation is what enables:
- validation
- visualization
- checkpoint
- resume
- interrupt
- tracing
The diagram below describes the current platform foundation implemented in this repository. It does not attempt to show the entire future ecosystem platform.
flowchart TB
USER["User / App"]
BUILDER["Graph Builder"]
COMPILED["Compiled Graph"]
RUNTIME["Runtime / Scheduler"]
subgraph CORE["AgentWorld Core"]
GRAPH["Graph Layer"]
OP["Operator Layer"]
A2A["A2A Protocol"]
STORE["Checkpoint / Trace / Artifact Store"]
end
subgraph CTRL["Controller Layer"]
C1["ClaudeCodeController"]
C2["CodexController"]
C3["OpenClawController"]
end
subgraph AGENTS["Strong Agents"]
A1["Claude Code"]
A2["Codex"]
A3["OpenClaw"]
end
USER --> BUILDER --> COMPILED --> RUNTIME
RUNTIME --> GRAPH
GRAPH --> OP
OP --> A2A
RUNTIME --> STORE
OP --> C1
OP --> C2
OP --> C3
C1 --> A1
C2 --> A2
C3 --> A3
This layer solves exactly one problem: how to drive a specific strong agent in the real world.
It is provider-specific.
For example:
ClaudeCodeControllerneeds to know how to create sessions, resume them, configure tool permissions, and parsestream-jsonCodexControllerneeds to know how to start tasks, consume events, and handle workspace changesOpenClawControllerneeds to know its own API / CLI / runtime constraints
This layer should expose normalized events upward, not leak provider-specific calling detail.
- session creation and recovery
- execution parameter assembly
- workspace binding
- output stream parsing
- tool permission mapping
- timeout / failure / interrupt handling
- raw trace preservation
- graph routing
- multi-agent collaboration strategy
- shared-state merging
- business workflow orchestration
This is the center of execution abstraction in the framework.
An Operator does not mean "a model". It means:
a strong-agent execution unit that can be scheduled by the graph, perform work, emit normalized outputs, and be resumed.
The graph layer only knows operators, not controllers.
- accept normalized input
- assemble prompt / context / inbox / artifacts / working directory
- invoke the underlying controller
- normalize lower-layer events
- emit normalized results
class Operator(Protocol):
def invoke(self, request: "OperatorRequest", runtime: "RuntimeContext") -> "OperatorResult":
...
def resume(self, request: "OperatorResumeRequest", runtime: "RuntimeContext") -> "OperatorResult":
...| Field | Meaning |
|---|---|
operator_id |
current operator identifier |
role |
role such as planner / coder / reviewer |
objective |
the node objective for this execution |
state_view |
the state slice visible to this node |
inbox |
incoming A2A messages |
artifacts |
visible artifacts |
skills |
skill names loaded for this operator execution |
working_dir |
working directory |
session_policy |
whether to create a new session, reuse one, or force resume |
tool_policy |
allowed tools and permission level |
timeout_s |
timeout in seconds |
metadata |
additional runtime metadata |
| Field | Meaning |
|---|---|
status |
success / failed / interrupted / timeout |
session_ref |
native session handle for the lower-layer agent |
messages |
normalized A2A output messages |
state_patch |
incremental graph-state update |
artifacts |
produced files, patches, reports, etc. |
handoffs |
explicit work handoffs to other operators |
metrics |
tokens, duration, tool-call count, etc. |
trace_ref |
reference to raw logs / streaming output |
error |
normalized error object |
This is the layer where the project can most clearly separate itself from a generic workflow framework.
Without an A2A protocol, multi-agent collaboration eventually collapses into:
- prompt concatenation everywhere
- arbitrary text passing everywhere
- no message filtering, routing, auditing, or replay
So this layer must define a real internal protocol.
- make agent-to-agent interaction structured
- make messages and artifacts traceable
- let the graph runtime understand what an output actually means
- make replay / evaluation / debugging possible later
Represents tasks, observations, conclusions, plans, review comments, and other textual or structured content.
Represents a tool call made by an agent.
Represents the result returned by a tool call.
Represents files, patches, reports, code snippets, charts, and other outputs produced by an agent.
Represents what task is handed off to which downstream agent.
class A2AEnvelope(TypedDict):
id: str
thread_id: str
sender: str
receiver: str | None
kind: str
payload: dict
artifacts: list[dict]
reply_to: str | None
created_at: strtaskplanobservationdecisionreviewtool_calltool_resultartifacthandofferrorfinal
The graph layer should borrow the right core ideas from LangGraph, but upgrade node semantics.
LangGraph's key insights are sound:
- use an explicit graph to describe workflows
- let nodes read and write shared state
- support
add_node / add_edge / add_conditional_edges / compile - execute through a compiled runtime
But in AgentWorld, nodes cannot be only generic function nodes. They must also support strong-agent nodes.
| Node Type | Purpose |
|---|---|
operator node |
invoke a strong-agent operator |
router node |
decide the next hop from state or messages |
reducer node |
merge parallel outputs |
tool node |
execute pure tool logic |
human node |
human approval / intervention |
| Edge Type | Purpose |
|---|---|
direct edge |
fixed-order execution |
conditional edge |
branch by condition |
fan-out |
one-to-many dispatch |
join edge |
wait for multiple predecessors |
dynamic send |
runtime delivery to selected nodes |
graph = AgentGraph(state_schema=State, context_schema=Context)
graph.add_operator("planner", planner_operator)
graph.add_operator("coder", coder_operator)
graph.add_node("plan", operator="planner")
graph.add_node("implement", operator="coder")
graph.add_edge("plan", "implement")
graph.add_conditional_edges("implement", route_fn)
compiled = graph.compile()Graph state should be strongly typed, partially updatable, and mergeable.
The LangGraph-style reducer idea is worth keeping:
- each state field can define its own merge rule
- when parallel nodes write to the same field, the reducer is responsible for convergence
| Field Type | Recommended Reducer |
|---|---|
messages: list |
append |
artifacts: list |
append |
metadata: dict |
merge |
final_answer |
last_value |
scores |
max / merge |
The runtime is the execution layer after graph compilation.
It is responsible for:
- node scheduling
- execution queue management
- state merging
- checkpoint
- resume
- interrupt
- retry
- timeout
- event stream
- tracing
This layer determines whether the framework is actually usable.
If these boundaries are not clean, the implementation will become messy very quickly.
- provider-specific invocation
- session lifecycle
- raw event parsing
- provider-specific parameters
- the upper-layer uniform request / uniform result contract
- prompt and context assembly
- integration with the graph runtime
- converting controller events into A2A outputs and
state_patch
- communication between agents
- task handoff
- process-level observations
- tool and artifact messages
- graph-level authoritative state
- structured data consumed by routing and reducers
- the runtime semantics that are truly persisted in checkpoints
In practice:
- put "chat-like content" into A2A
- put "workflow progress, conclusions, and aggregated results" into state
flowchart LR
OA["Operator A"]
OB["Operator B"]
RT["Runtime"]
ST["Shared State"]
IN["Inbox / A2A"]
OA -->|A2A message / handoff / artifact| IN
IN --> OB
OA -->|state_patch| RT
RT -->|reducer merge| ST
ST -->|state_view| OB
- structural declaration
- structural validation
- declaration of nodes, edges, and schema
- actual execution
- run / thread / session management
- checkpoint persistence
- interrupt / resume handling
Below is the recommended unified execution model.
| Object | Meaning |
|---|---|
graph_id |
graph definition identifier |
run_id |
one complete execution |
thread_id |
the same task thread, used for resume |
node_run_id |
one execution of one node |
operator_session_id |
the native session of the lower-layer strong agent |
sequenceDiagram
participant U as User/App
participant G as CompiledGraph
participant R as Runtime
participant N as OperatorNode
participant O as Operator
participant C as Controller
participant A as Strong Agent
U->>G: invoke(input, context)
G->>R: create run
R->>N: schedule node
N->>O: build OperatorRequest
O->>C: start/resume
C->>A: run task
A-->>C: stream events
C-->>O: normalized events
O-->>R: OperatorResult
R->>R: merge state_patch
R->>R: emit A2A messages / artifacts
R->>R: route next nodes
R-->>G: final state / stream
- The runtime selects executable nodes from the graph structure
- The node reads its visible
state_viewfrom global state - The operator assembles the request
- The controller invokes the underlying strong agent
- Lower-layer output is normalized into
ControllerEvent - The operator emits
messages + state_patch + artifacts + handoffs - The runtime merges state through reducers
- A router determines the next hop
- The runtime writes checkpoint and trace
This part must stay concrete, because it is where many projects sound abstractly correct and then fail in implementation.
class AgentController(Protocol):
def start(self, request: "ControllerStartRequest") -> "ControllerRunHandle":
...
def resume(self, request: "ControllerResumeRequest") -> "ControllerRunHandle":
...
def stream(self, handle: "ControllerRunHandle") -> Iterator["ControllerEvent"]:
...
def interrupt(self, session_id: str) -> None:
...| Field | Meaning |
|---|---|
session_id |
framework-assigned or framework-mapped session id |
working_dir |
current working directory |
instruction |
primary instruction |
attachments |
extra context and file references |
tool_policy |
tool permission policy |
env |
environment variables |
timeout_s |
timeout control |
No matter what format the lower-layer agent emits, upper layers should eventually see only a small set of normalized event types:
session_startedmessage_deltamessage_completedtool_calltool_resultartifact_createdstate_hintheartbeatcompletedfailed
From AutoR's operator.py, the truly valuable part is not the word "operator", but the fact that it has already worked through real strong-agent runtime problems:
- prompt cache
- attempt counting
session_idmanagement- resume and fallback start
- raw log persistence
- output file checking
- repair flows
Those concerns should be preserved in AgentWorld's controller and runtime design.
In other words, AgentWorld cannot stop at a clean abstraction. From day one it must also think about:
- what happens when a session breaks
- what happens when resume fails
- how logs are stored
- what happens when results are incomplete
- how one node retries
The output of a strong agent is no longer "just a text response". It may contain several things at once:
- a plan
- intermediate conclusions
- tool calls
- file modifications
- patches
- tasks that need to be handed to someone else
Without a protocol layer, all of that collapses into one blob of text, and the graph runtime cannot tell how to consume it.
- a
messageis communication - a
handoffis work transfer
A reviewer may send a review message without handing anything off. A planner may hand off "implement module X" to a coder.
The first version does not need a huge protocol. It only needs to guarantee this closed loop:
- one node can send a structured message to another node
- one node can attach artifacts
- the runtime can record message lineage
- downstream nodes can make decisions from their inbox
This layer should stay as simple as LangGraph where possible, while becoming more natural for strong agents.
state_schemacontext_schemaadd_nodeadd_edgeadd_conditional_edgescompile- reducers
- checkpoint / interrupt / stream
LangGraph nodes are function-like by default.
In AgentWorld, nodes need native support for:
- long-running execution
- external sessions
- artifacts
- agent-to-agent handoff
- workspace side effects
So node output should not be limited to a single dict patch. It should also be able to return control commands.
class GraphCommand(TypedDict, total=False):
update: dict
goto: str | list[str]
send: list[A2AEnvelope]
interrupt: bool
finish: boolThis borrows the idea behind LangGraph's Command / Send, but adapts it to a more multi-agent-oriented semantic model.
Strong-agent execution is not a single API call. It naturally encounters:
- interrupts
- timeouts
- retry after failure
- user intervention
- long-running task recovery
So the runtime must support checkpoints.
Resume exists on two different layers:
graph-level resume: recover the graph runoperator-level resume: recover the lower-layer agent session
Those two layers must not be conflated.
Interrupt is needed for:
- human approval
- pausing before high-risk operations
- waiting for external input when node output quality is not good enough
At minimum, there should be two trace layers:
raw trace: raw output from the lower-layer agentnormalized trace: the framework's normalized event stream
That is the only way debugging can answer:
- did the controller parse the event incorrectly?
- did the agent itself fail to produce the right result?
- or did the graph route incorrectly?
To keep the design document from exploding before the system exists, the first version should only include the following.
- one
AgentControllerbase contract ClaudeCodeControllerCodexController- one unified
Operator - one minimal A2A protocol
- one
AgentGraph add_node / add_edge / add_conditional_edges / compileinvoke / stream / resume- state reducers
- checkpoint / trace / artifact index
OpenClawController- human node
- retry policy
- cache policy
- multiple stream modes
- the full skill platform or marketplace
- a frontend UI
- distributed scheduling
- token economics or market mechanisms
- a complex permission system
Once the core runtime is working, it makes sense to add a few simple but representative built-in patterns first:
The most basic three-stage collaboration pattern.
This validates fan-out / join / reducer behavior.
This validates A2A message passing and artifact handoff behavior.
.
├── README.md
├── docs/
│ ├── index.html
│ ├── assets/
│ │ ├── site.css
│ │ ├── site.js
│ │ └── ...
│ ├── architecture.md
│ ├── protocol_a2a.md
│ ├── controller_contract.md
│ └── graph_runtime.md
├── skills/
│ ├── README.md
│ └── <skill-name>/
│ ├── SKILL.md
│ ├── references/
│ ├── scripts/
│ └── assets/
├── src/
│ └── agentworld/
│ ├── controller/
│ │ ├── base.py
│ │ ├── claude_code.py
│ │ ├── codex.py
│ │ └── openclaw.py
│ ├── operator/
│ │ ├── base.py
│ │ ├── models.py
│ │ └── default.py
│ ├── protocol/
│ │ ├── a2a.py
│ │ └── artifacts.py
│ ├── graph/
│ │ ├── builder.py
│ │ ├── compiled.py
│ │ ├── node.py
│ │ ├── edge.py
│ │ └── reducers.py
│ ├── runtime/
│ │ ├── scheduler.py
│ │ ├── executor.py
│ │ ├── checkpoint.py
│ │ ├── events.py
│ │ └── tracing.py
│ └── storage/
│ ├── artifacts.py
│ └── sessions.py
└── examples/
├── planner_coder_reviewer.py
└── parallel_workers.py
At the current stage, the priority is not to build many patterns. The priority is to define these five files correctly:
controller/base.pyoperator/models.pyprotocol/a2a.pygraph/builder.pyruntime/executor.py
Those five files will largely determine whether the architecture stays stable later.
This design clearly draws from two existing lines of work, but it should not copy either of them mechanically.
- "operator" must correspond to real execution control, not just a name
- session / attempt / resume / fallback / logs / repair must all be first-class concerns
- graph construction and runtime execution are separate
- explicit state schema
- reducers
- conditional edges
- compile before execution
- checkpoint / interrupt / stream
AgentWorld should not "wrap another LLM agent". It should establish the platform foundation for strong agents such as Claude Code, Codex, and OpenClaw: a unified control interface, A2A protocol, graph orchestration model, recoverable runtime, and skill-loading base that can support a broader ecosystem of benchmarks, applications, and community collaboration.