AgentWorld Platform Foundation Architecture

Architecture note for the foundation layer of a broader open ecosystem platform for strong agents. This document focuses on the currently implemented foundation slice: controllers, operators, A2A, runtime, graph execution, and skill loading.

1. What This Is

AgentWorld is the foundation repository for a broader open ecosystem platform for strong agents.

It is not another agent framework in the older sense of "wrapping an LLM SDK with prompts and tools."

The broader platform direction includes:

a common infrastructure layer for controlling and coordinating strong agents
a reusable skill platform
benchmark and evaluation layers
application ecosystems such as auto research, reliable agentic systems, and domain workflows
community collaboration and future exchange layers

This document focuses on the first part: the platform foundation.

It is aimed at a different problem:

The execution primitive is no longer a single LLM API call
The execution primitive is a strong agent such as Claude Code, Codex, or OpenClaw, with tool use, sessions, filesystem access, and long-running execution
The framework itself does not think on behalf of those agents; instead, it is responsible for:
- controlling them through a uniform upper-layer interface
- scheduling them
- organizing collaboration between them
- maintaining shared multi-agent state
- letting them run reliably inside a graph-based workflow
- establishing the infrastructure base that higher platform layers can build on

In one sentence, this repository is building the platform foundation for strong agents, not another LLM invocation wrapper.

2. How It Differs from the Previous Generation of Agent Frameworks

Core objects in the previous generation

The previous generation of frameworks usually centered around:

LLM
Prompt
Tool
Memory
Chain / AgentExecutor

Those frameworks typically assumed:

the agent itself is fundamentally a model-call loop
tool calls are just part of model output
session state is primarily managed by the framework itself

Core objects in this generation

In this generation, the core objects should become:

Controller: how to actually control a specific strong agent
Operator: the execution unit that exposes a uniform behavior to upper layers
A2A Protocol: how agents exchange messages, tool results, artifacts, and control signals
Graph: the collaboration graph across multiple agents
Runtime: scheduling, state merging, checkpoint, resume, interrupt, and trace

That means the smallest meaningful unit is no longer "a model call", but "a durable agent operator".

3. Design Goals

The goals of this foundation repository are straightforward:

Define one unified operator interface for strong agents
Push provider-specific control detail down into each controller
Provide LangGraph-like graph orchestration as one core foundation subsystem where nodes are strong-agent operators
Define an internal A2A protocol so agent communication is not just arbitrary text concatenation
Make runs recoverable, traceable, interruptible, and replayable
Establish the base layer that future skill, benchmark, and application systems can build on

4. Non-Goals

At this stage, the foundation repository is explicitly not trying to do the following:

reimplement the intelligence inside Claude Code / Codex / OpenClaw
force every agent into one prompt template
ship the whole ecosystem platform surface from day one
support every provider and every runtime environment from day one
turn the framework into a giant all-in-one system

Phase one only targets the minimum critical loop: unified control -> graph scheduling -> state management -> multi-agent collaboration -> recoverable execution

Higher platform layers should be able to grow on top of that loop later.

5. Design Principles

5.1 Strong-agent first

The framework assumes the lower layer is already a strong agent, not a bare model interface.

5.2 Concrete control belongs in the controller

Different strong agents vary dramatically, for example:

session creation differs
session recovery differs
tool permission configuration differs
streaming output format differs
filesystem and workspace constraints differ

Those differences should not leak into the graph layer. The controller should absorb them.

5.3 The upper-layer interface stays unified

To the graph and runtime, it should not matter whether the underlying provider is Claude Code or Codex. The graph only needs to know:

which operator this node invokes
what input the operator requires
what normalized result it returns
how state and routing should be updated afterward

5.4 State is explicit, messages are explicit

Shared state and agent-to-agent messages are different things and must be designed separately:

State: graph-level authoritative state
A2A Message: the communication carrier between agents

The system should not rely on plain natural-language context stitching, and it should not stuff everything into global state.

5.5 Builder and runtime stay separate

One of LangGraph's best ideas is:

graph construction only describes structure
execution happens in a compiled runtime

That separation is what enables:

validation
visualization
checkpoint
resume
interrupt
tracing

6. Overall Foundation Architecture

The diagram below describes the current platform foundation implemented in this repository. It does not attempt to show the entire future ecosystem platform.

flowchart TB
    USER["User / App"]
    BUILDER["Graph Builder"]
    COMPILED["Compiled Graph"]
    RUNTIME["Runtime / Scheduler"]

    subgraph CORE["AgentWorld Core"]
        GRAPH["Graph Layer"]
        OP["Operator Layer"]
        A2A["A2A Protocol"]
        STORE["Checkpoint / Trace / Artifact Store"]
    end

    subgraph CTRL["Controller Layer"]
        C1["ClaudeCodeController"]
        C2["CodexController"]
        C3["OpenClawController"]
    end

    subgraph AGENTS["Strong Agents"]
        A1["Claude Code"]
        A2["Codex"]
        A3["OpenClaw"]
    end

    USER --> BUILDER --> COMPILED --> RUNTIME
    RUNTIME --> GRAPH
    GRAPH --> OP
    OP --> A2A
    RUNTIME --> STORE

    OP --> C1
    OP --> C2
    OP --> C3

    C1 --> A1
    C2 --> A2
    C3 --> A3

7. Five Core Layers of the Current Foundation

7.1 Controller Layer

This layer solves exactly one problem: how to drive a specific strong agent in the real world.

It is provider-specific.

For example:

ClaudeCodeController needs to know how to create sessions, resume them, configure tool permissions, and parse stream-json
CodexController needs to know how to start tasks, consume events, and handle workspace changes
OpenClawController needs to know its own API / CLI / runtime constraints

This layer should expose normalized events upward, not leak provider-specific calling detail.

What the controller must handle

session creation and recovery
execution parameter assembly
workspace binding
output stream parsing
tool permission mapping
timeout / failure / interrupt handling
raw trace preservation

What the controller should not handle

graph routing
multi-agent collaboration strategy
shared-state merging
business workflow orchestration

7.2 Operator Layer

This is the center of execution abstraction in the framework.

An Operator does not mean "a model". It means:

a strong-agent execution unit that can be scheduled by the graph, perform work, emit normalized outputs, and be resumed.

The graph layer only knows operators, not controllers.

Operator responsibilities

accept normalized input
assemble prompt / context / inbox / artifacts / working directory
invoke the underlying controller
normalize lower-layer events
emit normalized results

Minimal operator interface

class Operator(Protocol):
    def invoke(self, request: "OperatorRequest", runtime: "RuntimeContext") -> "OperatorResult":
        ...

    def resume(self, request: "OperatorResumeRequest", runtime: "RuntimeContext") -> "OperatorResult":
        ...

Suggested `OperatorRequest` fields

Field	Meaning
`operator_id`	current operator identifier
`role`	role such as planner / coder / reviewer
`objective`	the node objective for this execution
`state_view`	the state slice visible to this node
`inbox`	incoming A2A messages
`artifacts`	visible artifacts
`skills`	skill names loaded for this operator execution
`working_dir`	working directory
`session_policy`	whether to create a new session, reuse one, or force resume
`tool_policy`	allowed tools and permission level
`timeout_s`	timeout in seconds
`metadata`	additional runtime metadata

Suggested `OperatorResult` fields

Field	Meaning
`status`	`success / failed / interrupted / timeout`
`session_ref`	native session handle for the lower-layer agent
`messages`	normalized A2A output messages
`state_patch`	incremental graph-state update
`artifacts`	produced files, patches, reports, etc.
`handoffs`	explicit work handoffs to other operators
`metrics`	tokens, duration, tool-call count, etc.
`trace_ref`	reference to raw logs / streaming output
`error`	normalized error object

7.3 A2A Protocol Layer

This is the layer where the project can most clearly separate itself from a generic workflow framework.

Without an A2A protocol, multi-agent collaboration eventually collapses into:

prompt concatenation everywhere
arbitrary text passing everywhere
no message filtering, routing, auditing, or replay

So this layer must define a real internal protocol.

Goals of the A2A protocol

make agent-to-agent interaction structured
make messages and artifacts traceable
let the graph runtime understand what an output actually means
make replay / evaluation / debugging possible later

Minimal protocol objects

1. Message

Represents tasks, observations, conclusions, plans, review comments, and other textual or structured content.

2. ToolCall

Represents a tool call made by an agent.

3. ToolResult

Represents the result returned by a tool call.

4. Artifact

Represents files, patches, reports, code snippets, charts, and other outputs produced by an agent.

5. Handoff

Represents what task is handed off to which downstream agent.

Suggested `A2AEnvelope` shape

class A2AEnvelope(TypedDict):
    id: str
    thread_id: str
    sender: str
    receiver: str | None
    kind: str
    payload: dict
    artifacts: list[dict]
    reply_to: str | None
    created_at: str

Recommended message kinds

task
plan
observation
decision
review
tool_call
tool_result
artifact
handoff
error
final

7.4 Graph Layer

The graph layer should borrow the right core ideas from LangGraph, but upgrade node semantics.

LangGraph's key insights are sound:

use an explicit graph to describe workflows
let nodes read and write shared state
support add_node / add_edge / add_conditional_edges / compile
execute through a compiled runtime

But in AgentWorld, nodes cannot be only generic function nodes. They must also support strong-agent nodes.

Minimum node types the graph should support

Node Type	Purpose
`operator node`	invoke a strong-agent operator
`router node`	decide the next hop from state or messages
`reducer node`	merge parallel outputs
`tool node`	execute pure tool logic
`human node`	human approval / intervention

Edge types the graph should support

Edge Type	Purpose
`direct edge`	fixed-order execution
`conditional edge`	branch by condition
`fan-out`	one-to-many dispatch
`join edge`	wait for multiple predecessors
`dynamic send`	runtime delivery to selected nodes

Suggested graph builder interface

graph = AgentGraph(state_schema=State, context_schema=Context)
graph.add_operator("planner", planner_operator)
graph.add_operator("coder", coder_operator)
graph.add_node("plan", operator="planner")
graph.add_node("implement", operator="coder")
graph.add_edge("plan", "implement")
graph.add_conditional_edges("implement", route_fn)
compiled = graph.compile()

State design principles

Graph state should be strongly typed, partially updatable, and mergeable.

The LangGraph-style reducer idea is worth keeping:

each state field can define its own merge rule
when parallel nodes write to the same field, the reducer is responsible for convergence

Common reducers

Field Type	Recommended Reducer
`messages: list`	append
`artifacts: list`	append
`metadata: dict`	merge
`final_answer`	last_value
`scores`	max / merge

7.5 Runtime Layer

The runtime is the execution layer after graph compilation.

It is responsible for:

node scheduling
execution queue management
state merging
checkpoint
resume
interrupt
retry
timeout
event stream
tracing

This layer determines whether the framework is actually usable.

8. The Three Boundaries That Matter Most

If these boundaries are not clean, the implementation will become messy very quickly.

8.1 The boundary between controller and operator

The controller owns

provider-specific invocation
session lifecycle
raw event parsing
provider-specific parameters

The operator owns

the upper-layer uniform request / uniform result contract
prompt and context assembly
integration with the graph runtime
converting controller events into A2A outputs and state_patch

8.2 The boundary between A2A and state

A2A owns

communication between agents
task handoff
process-level observations
tool and artifact messages

State owns

graph-level authoritative state
structured data consumed by routing and reducers
the runtime semantics that are truly persisted in checkpoints

In practice:

put "chat-like content" into A2A
put "workflow progress, conclusions, and aggregated results" into state

flowchart LR
    OA["Operator A"]
    OB["Operator B"]
    RT["Runtime"]
    ST["Shared State"]
    IN["Inbox / A2A"]

    OA -->|A2A message / handoff / artifact| IN
    IN --> OB

    OA -->|state_patch| RT
    RT -->|reducer merge| ST
    ST -->|state_view| OB

8.3 The boundary between builder and runtime

The builder owns

structural declaration
structural validation
declaration of nodes, edges, and schema

The runtime owns

actual execution
run / thread / session management
checkpoint persistence
interrupt / resume handling

9. Execution Model

Below is the recommended unified execution model.

9.1 Core objects

Object	Meaning
`graph_id`	graph definition identifier
`run_id`	one complete execution
`thread_id`	the same task thread, used for resume
`node_run_id`	one execution of one node
`operator_session_id`	the native session of the lower-layer strong agent

9.2 Execution flow

sequenceDiagram
    participant U as User/App
    participant G as CompiledGraph
    participant R as Runtime
    participant N as OperatorNode
    participant O as Operator
    participant C as Controller
    participant A as Strong Agent

    U->>G: invoke(input, context)
    G->>R: create run
    R->>N: schedule node
    N->>O: build OperatorRequest
    O->>C: start/resume
    C->>A: run task
    A-->>C: stream events
    C-->>O: normalized events
    O-->>R: OperatorResult
    R->>R: merge state_patch
    R->>R: emit A2A messages / artifacts
    R->>R: route next nodes
    R-->>G: final state / stream

9.3 Standard lifecycle of one node execution

The runtime selects executable nodes from the graph structure
The node reads its visible state_view from global state
The operator assembles the request
The controller invokes the underlying strong agent
Lower-layer output is normalized into ControllerEvent
The operator emits messages + state_patch + artifacts + handoffs
The runtime merges state through reducers
A router determines the next hop
The runtime writes checkpoint and trace

10. Controller Design Recommendations

This part must stay concrete, because it is where many projects sound abstractly correct and then fail in implementation.

10.1 Minimal controller interface

class AgentController(Protocol):
    def start(self, request: "ControllerStartRequest") -> "ControllerRunHandle":
        ...

    def resume(self, request: "ControllerResumeRequest") -> "ControllerRunHandle":
        ...

    def stream(self, handle: "ControllerRunHandle") -> Iterator["ControllerEvent"]:
        ...

    def interrupt(self, session_id: str) -> None:
        ...

10.2 Required fields for `ControllerStartRequest`

Field	Meaning
`session_id`	framework-assigned or framework-mapped session id
`working_dir`	current working directory
`instruction`	primary instruction
`attachments`	extra context and file references
`tool_policy`	tool permission policy
`env`	environment variables
`timeout_s`	timeout control

10.3 `ControllerEvent` must be normalized

No matter what format the lower-layer agent emits, upper layers should eventually see only a small set of normalized event types:

session_started
message_delta
message_completed
tool_call
tool_result
artifact_created
state_hint
heartbeat
completed
failed

10.4 What should be absorbed from AutoR

From AutoR's operator.py, the truly valuable part is not the word "operator", but the fact that it has already worked through real strong-agent runtime problems:

prompt cache
attempt counting
session_id management
resume and fallback start
raw log persistence
output file checking
repair flows

Those concerns should be preserved in AgentWorld's controller and runtime design.

In other words, AgentWorld cannot stop at a clean abstraction. From day one it must also think about:

what happens when a session breaks
what happens when resume fails
how logs are stored
what happens when results are incomplete
how one node retries

11. A2A Protocol Design Recommendations

11.1 Why A2A must exist as a separate layer

The output of a strong agent is no longer "just a text response". It may contain several things at once:

a plan
intermediate conclusions
tool calls
file modifications
patches
tasks that need to be handed to someone else

Without a protocol layer, all of that collapses into one blob of text, and the graph runtime cannot tell how to consume it.

11.2 The difference between A2A and handoff

a message is communication
a handoff is work transfer

A reviewer may send a review message without handing anything off. A planner may hand off "implement module X" to a coder.

11.3 The minimum A2A closed loop

The first version does not need a huge protocol. It only needs to guarantee this closed loop:

one node can send a structured message to another node
one node can attach artifacts
the runtime can record message lineage
downstream nodes can make decisions from their inbox

12. Graph Design Recommendations

This layer should stay as simple as LangGraph where possible, while becoming more natural for strong agents.

12.1 LangGraph ideas worth preserving

state_schema
context_schema
add_node
add_edge
add_conditional_edges
compile
reducers
checkpoint / interrupt / stream

12.2 What needs to be upgraded

LangGraph nodes are function-like by default.

In AgentWorld, nodes need native support for:

long-running execution
external sessions
artifacts
agent-to-agent handoff
workspace side effects

So node output should not be limited to a single dict patch. It should also be able to return control commands.

12.3 Suggested `GraphCommand`

class GraphCommand(TypedDict, total=False):
    update: dict
    goto: str | list[str]
    send: list[A2AEnvelope]
    interrupt: bool
    finish: bool

This borrows the idea behind LangGraph's Command / Send, but adapts it to a more multi-agent-oriented semantic model.

13. Runtime Design Recommendations

13.1 Checkpoint support is mandatory

Strong-agent execution is not a single API call. It naturally encounters:

interrupts
timeouts
retry after failure
user intervention
long-running task recovery

So the runtime must support checkpoints.

13.2 Resume support is mandatory

Resume exists on two different layers:

graph-level resume: recover the graph run
operator-level resume: recover the lower-layer agent session

Those two layers must not be conflated.

13.3 Interrupt support is mandatory

Interrupt is needed for:

human approval
pausing before high-risk operations
waiting for external input when node output quality is not good enough

13.4 Trace support is mandatory

At minimum, there should be two trace layers:

raw trace: raw output from the lower-layer agent
normalized trace: the framework's normalized event stream

That is the only way debugging can answer:

did the controller parse the event incorrectly?
did the agent itself fail to produce the right result?
or did the graph route incorrectly?

14. Minimal V1 Scope

To keep the design document from exploding before the system exists, the first version should only include the following.

14.1 V1 required

one AgentController base contract
ClaudeCodeController
CodexController
one unified Operator
one minimal A2A protocol
one AgentGraph
add_node / add_edge / add_conditional_edges / compile
invoke / stream / resume
state reducers
checkpoint / trace / artifact index

14.2 V1 optional

OpenClawController
human node
retry policy
cache policy
multiple stream modes

14.3 V1 excluded

the full skill platform or marketplace
a frontend UI
distributed scheduling
token economics or market mechanisms
a complex permission system

15. Recommended First Built-in Patterns

Once the core runtime is working, it makes sense to add a few simple but representative built-in patterns first:

15.1 Planner -> Coder -> Reviewer

The most basic three-stage collaboration pattern.

15.2 Planner -> Parallel Workers -> Reducer

This validates fan-out / join / reducer behavior.

15.3 Researcher -> Critic -> Writer

This validates A2A message passing and artifact handoff behavior.

16. Suggested Repository Structure

.
├── README.md
├── docs/
│   ├── index.html
│   ├── assets/
│   │   ├── site.css
│   │   ├── site.js
│   │   └── ...
│   ├── architecture.md
│   ├── protocol_a2a.md
│   ├── controller_contract.md
│   └── graph_runtime.md
├── skills/
│   ├── README.md
│   └── <skill-name>/
│       ├── SKILL.md
│       ├── references/
│       ├── scripts/
│       └── assets/
├── src/
│   └── agentworld/
│       ├── controller/
│       │   ├── base.py
│       │   ├── claude_code.py
│       │   ├── codex.py
│       │   └── openclaw.py
│       ├── operator/
│       │   ├── base.py
│       │   ├── models.py
│       │   └── default.py
│       ├── protocol/
│       │   ├── a2a.py
│       │   └── artifacts.py
│       ├── graph/
│       │   ├── builder.py
│       │   ├── compiled.py
│       │   ├── node.py
│       │   ├── edge.py
│       │   └── reducers.py
│       ├── runtime/
│       │   ├── scheduler.py
│       │   ├── executor.py
│       │   ├── checkpoint.py
│       │   ├── events.py
│       │   └── tracing.py
│       └── storage/
│           ├── artifacts.py
│           └── sessions.py
└── examples/
    ├── planner_coder_reviewer.py
    └── parallel_workers.py

17. What Should Be Written First Right Now

At the current stage, the priority is not to build many patterns. The priority is to define these five files correctly:

controller/base.py
operator/models.py
protocol/a2a.py
graph/builder.py
runtime/executor.py

Those five files will largely determine whether the architecture stays stable later.

18. Design References

This design clearly draws from two existing lines of work, but it should not copy either of them mechanically.

Inspiration from AutoR

"operator" must correspond to real execution control, not just a name
session / attempt / resume / fallback / logs / repair must all be first-class concerns

Inspiration from LangGraph

graph construction and runtime execution are separate
explicit state schema
reducers
conditional edges
compile before execution
checkpoint / interrupt / stream

19. One-Sentence Summary

AgentWorld should not "wrap another LLM agent". It should establish the platform foundation for strong agents such as Claude Code, Codex, and OpenClaw: a unified control interface, A2A protocol, graph orchestration model, recoverable runtime, and skill-loading base that can support a broader ecosystem of benchmarks, applications, and community collaboration.

FilesExpand file tree

architecture.md

Latest commit

History