Project Think roadmap: durable agent platform preview to stable developer experience

Project Think shipped the first preview of the next-generation Agents SDK harness: durable chat execution, Session-backed memory, sub-agent orchestration, workspace tools, codemode execution, extensions, and the beginnings of a full execution ladder.

The initial Think parity roadmap is mostly complete. This umbrella tracks the next phase: turning the preview into a coherent, stable developer experience, with clear docs, examples, integration stories, and focused hardening.

This issue is intentionally a hub. Child issues should be small and independently reviewable.

## Product goal

Make Think the default path for building durable, serverless agents that can think, act, persist, fork, and hand work off across Cloudflare infrastructure.

Success means a new contributor can answer, from this issue and its children:

- What has already shipped?
- What blocks a stable preview / eventual 1.0-quality experience?
- Which issue or PR owns each remaining piece of work?

## Track 1: Core Think reliability and turn semantics

- [x] #1429: Apply recovery to `Think.chat()` so sub-agent/RPC chat turns get the same durable recovery story as normal chat turns.
- [ ] #1386: Design chained turns / multi-phase continuation for long coding and research tasks without persisting synthetic user messages.
- [ ] #1322: Re-triage external provider + Zod schema compatibility against current dependency versions.
- [ ] #1343: Finish lifecycle hook polish/tests. Treat as hardening, not a launch blocker.
- [ ] Document `chatRecovery` / `onChatRecovery` consistently across Think docs, durable execution docs, and server-driven message docs.

## Track 2: Multi-session and app/product shape

Current direction: one Think child Durable Object per conversation, with a parent Durable Object owning directory/sidebar state and shared user-level resources. This matches the `examples/assistant` direction and answers most of #1349.

- [ ] Write `docs/think/multi-chat.md` for the parent/child multi-chat pattern.
- [ ] Decide whether `useChats()` should move from example-local code into `agents` / `agents/react`.
- [ ] Decide whether a `Chats` base class is worth promoting, or whether the pattern should stay documented with examples for now.
- [ ] Document the shared-resource boundary from `examples/assistant`: shared workspace + shared MCP, but per-chat messages, memory, extensions, and branch history.
- [ ] Measure / document parent DO scale limits for shared workspace and shared MCP fan-out.
- [x] Decide whether #1349 can be closed after the docs and example guidance are clear.

## Track 3: React/client package boundary

Think currently relies on `@cloudflare/ai-chat/react` for shared chat UI behavior, even though much of the protocol/runtime substrate has moved into `agents/chat`.

- [ ] Hoist shared React chat primitives from `@cloudflare/ai-chat/react` into `agents` or `agents/react`, with compatibility re-exports.
- [ ] Address Think-native initial message behavior so Think can avoid the redundant HTTP `get-messages` fetch / Suspense flash when WebSocket history is already authoritative.
- [ ] Link and triage related client backlog: #1011, #1045, #1361, #1414, #1420.
- [ ] Keep #1414 as discussion for now. Client-tool RPC delegation is interesting but likely too invasive for the immediate roadmap.

## Track 4: Execution ladder and tools

Think should remain useful at Tier 0 with only the workspace tools. Each higher tier should be additive and explicitly configured.

- [x] Land/review #1435 as the concrete fix for #1403 multimodal workspace reads.
- [ ] Decide whether #1392 multimodal memory is a separate Session/memory track after #1435 lands.
- [ ] #1319: Make `createSandboxTools` actually work, or clearly mark the export as a placeholder until Sandbox integration is ready.
- [ ] #1344: Design a whitelisted fetch tool that fits Think's capability model.
- [ ] Track browser/tooling follow-ups such as #1398 and #1401.
- [ ] Track MCP ergonomics and compact-tool-surface work: #1336, #1378, #1433, #1434.

## Track 4a: Think + Artifacts integration

Artifacts looks like the missing versioned handoff layer for Think workspaces and sessions: one repo per agent/session/task, forkable history, Git-compatible tooling, and short-lived repo-scoped tokens.

Start with focused child issue #1440: **Think + Artifacts: versioned workspaces, forks, and handoff**.

## Track 5: Examples, apps, and docs polish

Think needs one clear reference path for developers, plus smaller docs/examples that explain each capability without requiring readers to reverse-engineer the kitchen-sink app.

- [ ] Use `examples/assistant` as the primary Think reference app and document which pieces are canonical versus app-specific.
- [x] Decide what to do with draft PR #1135 (`think-cli` / `think-server`): revive as an app track, split into smaller issues, or close/supersede.
- [ ] Add / update Think docs for missing multi-chat guidance, peer dependency wording, Think vs `AIChatAgent`, hooks, client tools, recovery, and execution-ladder limitations.
- [ ] Keep `examples/multi-ai-chat` positioned as a lower-level Agents/chat example unless it is intentionally upgraded to Think.

## Track 6: Documentation and design hygiene

Several design and WIP docs still describe pre-Session or pre-shared-chat-layer assumptions. This is not all launch-blocking, but stale docs make it hard for humans and agents to pick up roadmap work safely.

- [ ] Refresh canonical Think design docs: `design/think.md`, `design/think-roadmap.md`, and `design/think-sessions.md`.
- [ ] Update shared chat design docs whose status now conflicts with shipped code: `design/chat-shared-layer.md`, `design/chat-improvements.md`, and `design/think-vs-aichat.md`.
- [ ] Add Think cross-links from adjacent docs: `sessions.md`, `durable-execution.md`, `long-running-agents.md`, `server-driven-messages.md`, `workspace.md`, `codemode.md`, `browse-the-web.md`, `mcp-client.md`, `client-sdk.md`, `workflows.md`, and `observability.md`.
- [ ] Fold still-relevant WIP notes into permanent docs/design, especially `wip/think-multi-session-assistant-plan.md` and `wip/inline-sub-agent-events.md`; mark the rest historical or remove when safe.

## Track 7: Experimental learnings and promotion candidates

The experimental folder has useful prior art, but it should inform Think deliberately instead of becoming hidden roadmap scope.

- [ ] Evaluate `experimental/session-skills` as a possible Think + Session skills reference app.
- [ ] Decide whether Session experiments (`session-memory`, `session-search`, `session-multichat`) need Think-first variants or should remain lower-level Agent + Session examples.
- [ ] Pull durability lessons from `forever-chat`, `forever-fibers`, and `inference-buffer` into the Think recovery track without making AI Gateway buffering a Think blocker.
- [ ] Use `gadgets-*` experiments as background for facets/sub-agent safety, without blocking Think’s stable preview on experimental loader/gatekeeper work.
- [ ] Update `experimental/README.md` so useful prior art is discoverable.

## Flagship Project Ideas

These are aspirational apps that should guide Think’s product direction. They are not stability blockers, but they should pressure-test the SDK, docs, examples, and integration story.

- **Personal assistant / life OS**: A long-lived assistant modeled after OpenClaw, with memory, calendar/email/tasks, browser use, documents, reminders, and human approval for sensitive actions.
- **Coding agent**: A durable coding agent with workspace, Artifacts/Git, Sandbox, Browser Run, tests, PR creation, code review, and recoverable long-running tasks.
- **SMB operating hub**: A hub for managing a small business: inbox triage, customer follow-up, invoices, bookings, inventory, support, website updates, marketing drafts, and analytics.
- **Research analyst**: A research workspace with browser crawling, citations, source snapshots, knowledge base, report generation, and follow-up monitoring.
- **Customer support copilot**: A persistent assistant per customer/account that knows past tickets, product usage, docs, changelogs, and can draft responses or run diagnostic workflows.
- **Agentic browser QA lab**: A Browser Run powered QA agent that explores app flows, records sessions, finds regressions, captures screenshots, and files issues.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Project Think roadmap: durable agent platform preview to stable developer experience #1439

Product goal

Track 1: Core Think reliability and turn semantics

Track 2: Multi-session and app/product shape

Track 3: React/client package boundary

Track 4: Execution ladder and tools

Track 4a: Think + Artifacts integration

Track 5: Examples, apps, and docs polish

Track 6: Documentation and design hygiene

Track 7: Experimental learnings and promotion candidates

Flagship Project Ideas

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Project Think roadmap: durable agent platform preview to stable developer experience #1439

Description

Product goal

Track 1: Core Think reliability and turn semantics

Track 2: Multi-session and app/product shape

Track 3: React/client package boundary

Track 4: Execution ladder and tools

Track 4a: Think + Artifacts integration

Track 5: Examples, apps, and docs polish

Track 6: Documentation and design hygiene

Track 7: Experimental learnings and promotion candidates

Flagship Project Ideas

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions