Skip to content

[MOCK][DO NOT MERGE] Future work. Desktop Mode Agents.#240

Open
AllTerrainDeveloper wants to merge 3 commits into
trunkfrom
my-agents
Open

[MOCK][DO NOT MERGE] Future work. Desktop Mode Agents.#240
AllTerrainDeveloper wants to merge 3 commits into
trunkfrom
my-agents

Conversation

@AllTerrainDeveloper
Copy link
Copy Markdown
Collaborator

@AllTerrainDeveloper AllTerrainDeveloper commented May 18, 2026

The pitch

A WordPress site is a tool the user operates. Agents make it a tool that operates with the user: durable, addressable workers that live on the site, take orders by chat, drag, hook, or HTTP, and use the same APIs a human admin would.

This PR ships the navigation surface and visual contract. Everything described below is what we build on top of it, in the order described, behind this UI.

What an Agent is — three layers, by design

We split an Agent across three existing WordPress primitives instead of inventing a new one. This is the architectural decision that lets every other piece compose cleanly.

Layer 1 — Identity: a WordPress user

Each Agent is a real row in wp_users with three constraints:

  1. A role (administrator, editor, author, contributor). Capabilities are real WP capabilities; an Agent that runs wp_insert_post() is gated by current_user_can( 'edit_posts' ) exactly like a human.
  2. No login. The authenticate filter rejects them, password resets are disabled, cookie auth refuses them. They exist only as actors the site invokes on its own behalf.
  3. An identity surface (avatar, display name, attribution on revisions, comments, audit logs). When an Agent edits a post, the post's _edit_lock and revision author show the Agent — just like a human collaborator.

Layer 2 — Behavior: the wp_guideline CPT (portable) @artpi

The Agent's wp_guideline post IS its behavior. Every field that shapes how the Agent thinks, every toggle that changes what it can do, every list of supplemental knowledge it can reach for — all of it reads from and writes to a single wp_guideline post. There is no parallel Desktop-Mode table for prompts, no separate options row for the tool list, no shadow registry for skills. One post per Agent, and that post is the agent definition.

This is the CPT the broader Automattic agent ecosystem (Dolly, Push MD, the in-tree Guidelines experiment) already uses, which is why this works.

Concretely, every behavior-shaping change in the Desktop Mode UI lands as a write to this one post:

What the user does in the UI What happens in storage
Edits the system prompt wp_update_post() on the guideline's post_content
Ticks a checkbox to enable an ability (wordpress/list-posts) add_post_meta( $guideline_id, '_agent_abilities', 'wordpress/list-posts' )
Unticks a checkbox to disable an ability delete_post_meta( $guideline_id, '_agent_abilities', 'wordpress/list-posts' )
Attaches a skill (writing/headline-style) Link the child wp_guideline to the parent — Dolly's existing relationship model, no new schema
Detaches a skill Remove that link
Edits a skill's body wp_update_post() on the child guideline (the skill is its own post; the agent just references it)

Nothing about the agent's brain lives outside wp_guideline. Pull up the agent's guideline post in any tool that speaks WP REST (Gutenberg, wp-cli, Push MD, an external script) and you have the entire behavior surface — prompt, tool toggles, attached skills — editable in one place, revisable through the standard editorial UI, auditable via the standard revisions table.

And because skills are themselves wp_guideline posts, this composes recursively: a skill can be used by many agents; an agent can mix-and-match skills from many authors; a single skill update propagates to every agent referencing it. Same pattern Dolly ships.

This layer is fully portable. Nothing in any of these fields is Desktop-Mode-specific — every value is something Claude Code, Codex, Cursor, or any other agent runtime understands natively. pushmd pull the site and the agent's brain materialises into the consumer's skills/ folder verbatim, tool toggles and all.

Layer 3 — Bindings: user meta on the Agent (site-specific)

Everything about how this site invokes the Agent lives as user meta on the Agent's wp_users row. The fields here are intentionally outside wp_guideline because they would be meaningless to consumers that aren't Desktop Mode:

  • Triggers: which WP hook the Agent subscribes to (save_post, wp_insert_comment, …), which REST endpoint it exposes and under what auth, which drop payloads its tile accepts, which agents it chains to.
  • Runtime overrides: per-Agent model selection that beats the platform default; per-Agent rate limits; per-Agent debug flags.

This is the layer Claude Code in a terminal doesn't have an opinion about — it invokes agents directly, no hook subscription, no REST gateway. Keeping bindings out of the guideline means the guideline travels, the bindings don't: clone the site, get the agent's brain; configure how your site invokes it separately. Same agent definition, different invocation policy per environment.

The split, in one line

wp_users row = who. wp_guideline post = what it does (prompt + tools + skills, all of it). user meta = how this site reaches it.

Updates to bindings don't touch behavior; updates to behavior don't touch identity; rename the identity and the other two don't move. And critically: if a user clicks anything in the agent's Define / Tools / Skills UI, the only thing that changes on disk is one wp_guideline post.

Why this split matters: free ecosystem compatibility

Because behavior lives in wp_guideline, every Agent we ship is automatically discoverable by any AI client that already speaks this CPT — including Claude Code, Codex, and anything in the Automattic agent ecosystem. The mechanism (already in production via pushmd.blog): your WordPress site becomes a Git remote, every guideline materialises as wp_guideline/skills/{slug}/SKILL.md with an AGENTS.md alias, and a local git clone of the site drops a working skills/ folder into the consumer's checkout. Claude Code reads it. Codex reads it. Cursor reads it. No bespoke integration, no separate sync layer, no second source of truth.

The agent you build inside Desktop Mode shows up in your terminal the moment you pushmd pull. The agent your developer hand-writes as a .md file in their checkout shows up in Desktop Mode the moment they git push. Same artifact, two front-ends.

How you talk to an Agent — five triggers, one mental model

Every interaction with an Agent is a trigger the user configures up front. The five triggers we plan to ship:

Trigger Looks like Use case
Drag & drop Drop a media tile (or post, user, comment) onto the Agent's tile "Remove the background from these 12 images."
Chat Double-click the Agent → conversation window "Audit this post and tighten the headings."
Hook Subscribe to a WP action (save_post, wp_insert_comment, …) "Moderate every new comment automatically."
REST endpoint Authenticated or anonymous POST /agents/v1/<slug> "Trigger a newsletter send from our build pipeline."
Agent-to-agent One Agent's output feeds another's input "When SEO scoring finishes, hand the verdict to the publishing Agent."

All five collapse to the same loop: a message arrives → the Agent's system prompt + the message become an LLM call → the model picks tools off the allowlist → tools run as the Agent's user → the result is the trigger's return value. Drag-and-drop is a chat with a media payload. A hook subscription is a chat where the message is the hook args. An endpoint is a chat where the body is the message. Same engine, different intake.

Trigger configuration is user meta on the Agent (Layer 3), not part of the guideline. Two reasons: (1) triggers are site-specific — the same "Moderate Comments" guideline can sit on one site that listens to wp_insert_comment and another that only exposes the REST endpoint, with no fork of the underlying agent definition; (2) triggers are a Desktop-Mode concept that wouldn't round-trip cleanly through pushmd / Claude Code / Codex anyway, so they don't belong in the layer those tools consume.

Tools = the WordPress Abilities API

WordPress 6.9 introduced wp_register_ability() — Core's first-party way to expose typed, schema-described actions to AI tooling. Agents read that registry at runtime: every ability becomes a candidate tool with its declared parameters schema converted to the OpenAI / Anthropic / Gemini function-calling shape. The user picks which ones each Agent gets, and the picks live as post meta on the guideline.

Three properties that fall out of this:

  • Ecosystem leverage. Any plugin that registers an ability becomes Agent-compatible. WooCommerce abilities → commerce Agents. Yoast abilities → SEO Agents. No bespoke integration code per plugin.
  • Capability gating for free. Each ability already carries its own permission_callback. The Agent runs as itself (its WP user), so the same checks that protect a human editor protect the Agent.
  • Tool palette UX. The checkbox list is a view over wp_get_abilities(). As Core grows that registry, every Agent's picker grows with it.

For abilities not yet exposed by Core or plugins, the existing desktop_mode_register_ai_tool() registry plugs the gap and feeds the same picker.

This Abilities-to-tools mapping is the piece the broader ecosystem doesn't have yet. It's what turns "an agent that knows things" (skills, instructions) into "an agent that does things" (tool calls against a typed schema).

LLM provider — bring your own

Agents need a model. We already ship a provider registry (desktop_mode_register_ai_provider) that supports OpenAI's Responses API today and is structured for Anthropic, Gemini, and any vendor a plugin author wires up. Two implications for Agents:

  • No LLM key, no Agent creation. The "Create Agent" button greys out with a notice pointing at the OS Settings AI panel. The section still renders — it's a hint, not a hard gate — and existing Agents still appear so users can audit them before installing a key.
  • Per-Agent model overrides. Some Agents are cheap classification jobs (Moderate Comments → Haiku-tier); some need real reasoning (Audit Post → Sonnet-tier). Model choice is a binding (Layer 3, user meta) rather than part of the portable guideline — what model you pay for on this site has no business travelling with the agent definition.

Drag-and-drop is the North Star

This is the feature that proves the desktop metaphor isn't a skin. "Drag this image onto the Remove BG agent" is a sentence non-technical users say out loud and expect to work. The cross-window drag bridge needed for Media-into-Gutenberg is the same machinery needed for tile-into-Agent — so every Agent we ship pulls the bridge closer to finished.

What it requires on the Agent side: an accepted-payload manifest in the drag-trigger binding (MIME types, entity kinds, post types) and a drop handler that converts the dropped payload into the chat message format. The framework already knows how to draw the ghost, hit-test windows, and route the drop — Agents are just one more drop target type.

Security model — boring on purpose

A new actor that can take HTTP requests, listen to hooks, and call tools is a security surface. The boring-on-purpose answer:

  • Agents are users. Every action lands in WordPress's existing audit trail with the Agent's user ID as the actor. No new auth model, no parallel ACL.
  • Capabilities are inherited, not granted. Selecting a tool from the abilities picker does not elevate the Agent. If the Agent's role can't run the underlying ability, the call fails the same way a human's would.
  • Trigger gating is explicit. Each trigger carries its own auth (capability for chat, REST permission for endpoints, hook firing user for actions). The Agent never runs un-gated.
  • No outbound creds in payloads. Tool results are sanitised before going back into the LLM context. The model sees what the editor would see in the wp-admin UI, not what wp-cli would see in the database.
  • Behavior is auditable. Because instructions live as wp_guideline posts, every prompt change is a real revision in wp_postmeta, attributable to a real user, reviewable in the standard editorial UI.

Why ship the UX mock first

This PR is intentionally a navigation point and a visual contract, nothing else. Four hard-coded Agents, a read-only Define / Tools / Triggers right pane, a "+ Create agent" button that says Coming soon. No data model writes, no login block, no real LLM call, no real trigger plumbing.

The reason is concrete: the surface area below is large, and the right argument about which slice to build first is the one that holds up to looking at the actual screen. Shipping the screen unlocks that argument without committing the team to any single backend shape. Now that the storage layer has a clear answer (wp_guideline CPT, ecosystem-compatible from day one), the next slice is obvious — but we still want the screen real before the wiring goes in.

Every architectural choice in this PR is reversible. The files added or touched can be deleted in one commit if the direction changes. What survives is the lesson: this is what it should look like.

The order we'll build it

  1. Behavior layer — adopt wp_guideline. Each Agent gets a guideline post storing prompt + tool allowlist + skill links. This is the load-bearing decision and it goes first because every later layer references it.
  2. Identity layer — synthetic users. Agent role(s) added, authenticate filter rejects synthetic users, password reset disabled, REST cookie auth refuses them, audit-trail attribution works end to end. The user row carries a single piece of meta linking to its guideline.
  3. Abilities bridge. desktop_mode_ai_tools consumer that harvests wp_get_abilities() into LLM-shaped tools, capability-gated, deduped against the existing tool registry, the allowlist meta on the guideline filters down to the per-call tool set.
  4. Push MD compatibility audit. Once 1 + 2 + 3 are in, validate that the Agent guidelines materialise correctly under wp_guideline/skills/{slug}/SKILL.md via pushmd. This is the moment Desktop Mode Agents become natively discoverable by Claude Code, Codex, and the rest of the ecosystem. We don't need to ship pushmd — we just need to not break the shape it expects.
  5. Bindings layer — user meta scaffold. Trigger configuration, model override, rate limits land as structured user meta on the Agent's wp_users row. Nothing wired yet — just the shape, so steps 6–10 can drop in without rework.
  6. Chat trigger. Multi-instance native window per Agent, conversation history per ( user × agent ), streaming via the existing AI Copilot endpoint, tool dispatch on the client.
  7. Hook trigger. Subscription registry derived from the bindings meta, hook → Agent invocation, optional filter-return propagation.
  8. REST endpoint trigger. Per-Agent route registration with auth choice (capability / nonce / anonymous), request body becomes the chat message.
  9. Drag trigger. Agent tiles become drop targets; the cross-window drag bridge already in flight (the North Star) carries the payload.
  10. Agent-to-agent trigger. desktop_mode_agent_completed action, chained invocations, loop detection.
  11. Marketplace. Packaged Agent definitions third parties ship as plugins (or as Push MD guideline collections), same way block patterns work today.

Each step ships behind feature flags so plugin authors can test against trunk without us cutting a stable release until the contract settles.

Where this leads

The endgame is a WordPress site where the workflow looks like this:

The author finishes a draft, drags the post onto the Audit Agent. The Agent reviews structure, drops a note in the Reviewer queue, hands the draft to the Optimize SEO Agent on success, which proposes edits, asks for one human OK, then hands the draft to the Schedule Agent, which picks a publication time based on traffic analytics and queues the Send to mail list Agent for the moment of publication.

None of those steps need a custom plugin. They're four Agents the user composed by ticking abilities and wiring triggers, using one screen — the one this PR introduces. And because the guidelines live in wp_guideline, the same author can pushmd pull from their laptop and edit the Optimize SEO Agent's system prompt in a code editor, then push it back. Or have Claude Code in their terminal call the same Audit Agent through its REST endpoint while writing a different post. One source of truth, every surface.

What this PR is not

  • Not a feature flag for production users. The Agents section appears in every My WordPress window. That's deliberate — we want to learn from the question "what is this?" before we ship the working version.
  • Not a public API. No docs/hooks-reference.md entries, no docs/examples/agents.md. The contract is the screen. We add the API surface when the backend lands.
  • Not the final design. Define / Tools / Triggers is a starting point, not a finished interaction. Expect iteration on the trigger configuration UI, the abilities picker density, the chat affordance, and the empty state for "no LLM configured."
image image image Open WordPress Playground Preview

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 18, 2026

✅ WordPress Plugin Check Report

✅ Status: Passed

📊 Report

All checks passed! No errors or warnings found.


🤖 Generated by WordPress Plugin Check Action • Learn more about Plugin Check

- Introduced a new Agents entity in the My WordPress section, including an inline SVG icon for visual consistency.
- Implemented a mock renderer for the Agents section, allowing for a UI preview without backend integration.
- Created mock data for four fictional agents, each with defined abilities and triggers.
- Added tests to ensure the integrity of the mock data and the rendering functionality.
- Updated entity registration to include the new Agents kind and adjusted related types accordingly.
- Implement unit tests for the Agents REST API in `agentsRest.php`, covering endpoints for listing, creating, updating, and deleting agents, as well as permission checks.
- Create integration tests for the Agents renderer in `agents-renderer.test.ts`, ensuring proper rendering of agent data and UI interactions.
- Add tests for the Agents REST adapter in `agents-rest.test.ts`, validating fetch requests and response handling.
- Introduce tests for the Agents send-to functionality in `agents-send-to.test.ts`, verifying caching, menu interactions, and event dispatching.
juanlentino added a commit to juanlentino/signal-and-noise-tools that referenced this pull request May 24, 2026
Reading WordPress/desktop-mode PR #240 (Future work. Desktop Mode Agents.)
revealed two architectural facts that retire v3.8.0:

1. The Anthropic provider is GENERIC infrastructure — it contains zero
   Signal & Noise content. It belongs in desktop-mode itself, not in
   our plugin. PR #240's §"LLM provider — bring your own" explicitly
   names Anthropic as the kind of provider "a plugin author wires up,"
   but the better path is upstream contribution since the work has no
   SN-specific surface area.

2. The 26 manual `desktop_mode_register_ai_tool()` registrations we
   planned for Tasks 5-7 will be obsoleted by step 3 of PR #240's
   Agents framework, which auto-harvests `wp_register_ability()`
   registrations into LLM-shaped tools. Our 12 theme abilities (theme
   v9.1.1) + 17 plugin abilities (plugin v3.7.3) are already
   future-compatible — no plugin-side work needed for them to surface
   in the Agents framework when it lands.

What this commit changes:
  - Deletes inc/ai-copilot/ (the 3 anthropic-* files + .gitkeep
    scaffold from Tasks 1-4, originally committed in 6425ab9,
    d3d89cc, 92e39cc, a1275b2)
  - Deletes tests/anthropic-provider.php (71 assertions of provider
    coverage, ported to the upstream PR's PHPUnit tests)
  - Removes the conditional require_once block from
    signal-and-noise-tools.php — back to its pre-Task-1 state
  - Annotates the v3.8.0 spec + plan with CANCELLED headers pointing
    to the upstream contribution path

What stays:
  - Theme v9.1.1 + plugin v3.7.3 production state — unchanged
  - The 12 launcher commands in inc/desktop-mode-integration.php from
    commit b3430cc (display-only ⌘K entries, harmless)
  - All 9 legacy test suites still pass — 550 assertions across
    admin-tabs, ai-bootstrap, bot-detection, cron-dashboard,
    cron-history, health-checks, insights, theme-ability-commands,
    webhooks

Upstream contribution work continues in the fork at
juanlentino/desktop-mode (cloned to ../desktop-mode/). The provider
code itself is preserved in git history at the SHAs listed above and
will be ported to the PR with desktop-mode's `desktop_mode_ai_*`
function-prefix conventions modeled on includes/ai-copilot/openai.php.

Reference: WordPress/desktop-mode#240
           WordPress/desktop-mode#271
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant