spiritledsoftware · ian-pascoe · Jun 5, 2026 · Jun 5, 2026 · Jun 5, 2026 · Jun 8, 2026
diff --git a/.changeset/native-opencode-code-mode.md b/.changeset/native-opencode-code-mode.md
@@ -0,0 +1,5 @@
+---
+"@caplets/opencode": patch
+---
+
+Package native Caplets Code Mode assets for OpenCode and cover `caplets_run` registration.
diff --git a/.changeset/small-paws-draw.md b/.changeset/small-paws-draw.md
@@ -0,0 +1,5 @@
+---
+"@caplets/core": patch
+---
+
+Fix cloud attach URL normalization
diff --git a/AGENTS.md b/AGENTS.md
@@ -4,7 +4,7 @@
 
 - Use `pnpm` only; the repo pins `pnpm@11.5.0` and requires Node `>=24`.
 - Install with `pnpm install --frozen-lockfile` when matching CI.
-- Full local gate and pre-push hook: `pnpm verify` (`format:check -> lint -> typecheck -> schema:check -> test -> benchmark:check -> build`).
+- Full local gate and pre-push hook: `pnpm verify` (`format:check -> lint -> code-mode:check-api -> typecheck -> schema:check -> test -> benchmark:check -> build`).
 - Fast focused checks: `pnpm format:check`, `pnpm lint`, `pnpm typecheck`, `pnpm test`, `pnpm build`.
 - Run one package: `pnpm --filter @caplets/core test`, `pnpm --filter caplets build`, or replace the filter with `@caplets/opencode`, `@caplets/pi`, `@caplets/benchmarks`.
 - Run one Vitest file by passing it after the package script, e.g. `pnpm --filter @caplets/core test -- test/config.test.ts`.
@@ -21,6 +21,7 @@
 
 - Put design specs in `docs/specs/`, implementation plans in `docs/plans/`, and product requirements documents in `docs/product/`; do not use `docs/superpowers/` in this repo.
 - Config schema source of truth is Zod in `packages/core/src/config.ts`; update `schemas/caplets-config.schema.json` with `pnpm schema:generate` and verify with `pnpm schema:check`.
+- Code Mode runtime API declaration source of truth is `packages/core/src/code-mode/runtime-api.d.ts`; update `packages/core/src/code-mode/runtime-api.generated.ts` with `pnpm code-mode:generate-api` and verify with `pnpm code-mode:check-api`.
 - `pnpm benchmark` updates `docs/benchmarks/coding-agent.md`; `pnpm benchmark:check` fails if the committed report is stale.
 - Live benchmarks are opt-in only: build first, then run `CAPLETS_BENCH_LIVE=1 pnpm benchmark:live:opencode` or `CAPLETS_BENCH_LIVE=1 pnpm benchmark:live:pi`; results are local/model-dependent and not deterministic product claims.
 

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,11 @@
 # caplets
 
+## Unreleased
+
+### Major Changes
+
+- Breaking: Caplet progressive wrapper operation names now use `check`, `tools`, `describe_tool`, resource/prompt operation names without `list_`, and `name`/`args` fields instead of `tool`/`prompt`/`arguments`. Code Mode declarations now expose comprehensive Caplet handles with paginated discovery, result envelopes, resource/prompt methods, loose TypeScript diagnostics, and schema-derived `callSignature`.
+
 ## 0.12.0
 
 ### Minor Changes

diff --git a/README.md b/README.md
@@ -28,7 +28,7 @@
 
 Caplets turns MCP servers, APIs, and commands into focused agent capabilities: one card first, searchable tools next, inspectable schemas before calls, and preserved results after.
 
-Stop dumping every operation into context up front. Caplets wraps each tool source as a capability an agent can discover, inspect, call, and recover from one step at a time. Instead of exposing a giant flat wall of operations, Caplets shows a compact capability card with source, status, and next actions. The agent chooses a domain first, then uses scoped operations like `search_tools`, `get_tool`, and `call_tool` only when it needs more detail.
+Stop dumping every operation into context up front. Caplets wraps each tool source as a capability an agent can discover, inspect, call, and recover from one step at a time. Instead of exposing a giant flat wall of operations, Caplets shows a compact capability card with source, status, and next actions. The agent chooses a domain first, then uses scoped operations like `search_tools`, `describe_tool`, and `call_tool` only when it needs more detail.
 
 For MCP-backed Caplets, the scoped operation set also includes resource discovery and reading, prompt listing and rendering, resource-template discovery, and completion for prompt or template arguments. Non-MCP backends expose focused tool and action operations.
 
@@ -43,7 +43,7 @@ caplets add mcp context7 --command npx --arg -y --arg @upstash/context7-mcp
 caplets serve
 ```
 
-In the deterministic benchmark, 106 flat tools became 3 top-level capabilities with an 87.9% smaller initial payload. Your agent starts with `context7`, then drills in through `inspect`, `search_tools`, `get_tool`, and `call_tool` only when needed.
+In the deterministic benchmark, 106 flat tools became 3 top-level capabilities with an 87.9% smaller initial payload. Your agent starts with `context7`, then drills in through `inspect`, `search_tools`, `describe_tool`, and `call_tool` only when needed.
 
 ## Quick Start
 
@@ -312,8 +312,8 @@ Flat tool lists make agents guess before they understand. If every downstream se
 Caplets turns that flat wall into a staged path:
 
 1. **Choose** a capability, such as `GitHub`.
-2. **Inspect** matching operations with `search_tools` or `list_tools`.
-3. **Resolve** the exact schema with `get_tool`.
+2. **Inspect** matching operations with `search_tools` or `tools`.
+3. **Resolve** the exact schema with `describe_tool`.
 4. **Invoke** with `call_tool` while preserving downstream content, structured data, and error state.
 
 A backend enters agent context as a focused card with source, status, and next actions, not a wall of operations.
@@ -390,7 +390,7 @@ If a backend fails, Caplets keeps the error scoped to the capability, preserves
 - Uses the configured `name` and `description` as the capability card shown to agents.
 - Starts downstream MCP servers and loads OpenAPI specs lazily when an operation needs them.
 - Supports stdio, Streamable HTTP, and legacy HTTP+SSE downstream servers.
-- Lets agents `list_tools`, `search_tools`, `get_tool`, and `call_tool` within one selected Caplet namespace.
+- Lets agents `tools`, `search_tools`, `describe_tool`, and `call_tool` within one selected Caplet namespace.
 - Converts OpenAPI operations into MCP-style tool metadata and executes HTTP calls directly.
 - Converts configured GraphQL operations into MCP-style tool metadata, and can auto-generate GraphQL tools from schema root query and mutation fields.
 - Converts explicitly configured HTTP actions into MCP-style tool metadata and executes HTTP calls directly.
@@ -780,7 +780,7 @@ OpenAPI auth is explicit and supports:
 - `{"type": "oauth2", ...}`
 - `{"type": "oidc", ...}`
 
-OpenAPI `call_tool.arguments` uses grouped HTTP inputs:
+OpenAPI `call_tool.args` uses grouped HTTP inputs:
 
 ```json
 {
@@ -824,7 +824,7 @@ endpoint and exactly one schema source: `schemaPath`, `schemaUrl`, or `introspec
 
 When `operations` is omitted or empty, Caplets auto-generates tools from schema root
 fields: `query_<field>` and `mutation_<field>`. Generated tools use bounded scalar
-selection sets and pass `call_tool.arguments` directly as GraphQL variables/root-field
+selection sets and pass `call_tool.args` directly as GraphQL variables/root-field
 arguments.
 
 Every GraphQL endpoint can set:
@@ -878,7 +878,7 @@ must start with `/` and be URL paths that cannot change origin or escape the bas
 Action mappings can set `query`, `headers`, and `jsonBody`. `query` and `headers` must resolve
 to object maps whose values are strings, numbers, or booleans. `jsonBody` may use literals,
 nested arrays/objects, `$input.field` references, or `$input` for the whole argument object.
-Path placeholders such as `{service}` are read directly from `call_tool.arguments` and URL-encoded.
+Path placeholders such as `{service}` are read directly from `call_tool.args` and URL-encoded.
 Configured action headers cannot set managed headers such as `authorization`, `host`,
 `content-length`, `connection`, or `content-type`; JSON bodies set `content-type` automatically.
 
@@ -939,8 +939,8 @@ an existing destination file.
 ### Caplet Sets
 
 Use `capletSets` to expose another Caplets collection as nested Caplets. Each child Caplet appears
-as one downstream tool and supports the full Caplets operation set: `inspect`, `check_backend`,
-`list_tools`, `search_tools`, `get_tool`, and `call_tool`.
+as one downstream tool and supports the full Caplets operation set: `inspect`, `check`,
+`tools`, `search_tools`, `describe_tool`, and `call_tool`.
 
 ```json
 {
@@ -1135,7 +1135,7 @@ Each generated Caplet tool accepts an `operation`:
 
 ```json
 {
-  "operation": "list_tools"
+  "operation": "tools"
 }
 ```
 
@@ -1153,7 +1153,7 @@ Inspect one exact downstream tool:
 
 ```json
 {
-  "operation": "get_tool",
+  "operation": "describe_tool",
   "tool": "read_file"
 }
 ```
@@ -1173,23 +1173,23 @@ Call one exact downstream tool:
 Available operations:
 
 - `inspect`: return the configured capability card without starting the downstream server.
-- `check_backend`: verify the selected backend, whether MCP, OpenAPI, GraphQL, HTTP, CLI, or nested Caplets.
-- `list_tools`: return compact downstream tool metadata.
+- `check`: verify the selected backend, whether MCP, OpenAPI, GraphQL, HTTP, CLI, or nested Caplets.
+- `tools`: return compact downstream tool metadata.
 - `search_tools`: search downstream tool names and descriptions within this Caplet.
-- `get_tool`: return full metadata for one exact downstream tool.
+- `describe_tool`: return full metadata for one exact downstream tool.
 - `call_tool`: invoke one exact downstream tool with JSON object arguments.
 
 Requests are strict: operation-specific extra fields are rejected, and `call_tool` requires
 `arguments` to be a JSON object.
 
-Discovery operations (`inspect`, `check_backend`, `list_tools`, `search_tools`, and
-`get_tool`) return wrapper-generated results whose `structuredContent.caplets` field
+Discovery operations (`inspect`, `check`, `tools`, `search_tools`, and
+`describe_tool`) return wrapper-generated results whose `structuredContent.caplets` field
 identifies the Caplet with `id`, plus backend, operation, status, and elapsed time when
 available. Discovery result objects and compact tool entries also use `id` for the
-configured Caplet identity. Compact `list_tools` and `search_tools` entries may include
+configured Caplet identity. Compact `tools` and `search_tools` entries may include
 input/output schema hashes; treat those
 hashes as reuse hints for a schema you have already inspected, not as a replacement for
-`get_tool` when arguments, output, or semantics are unclear.
+`describe_tool` when arguments, output, or semantics are unclear.
 
 Direct `call_tool` preserves the downstream tool result shape instead of wrapping it in
 `structuredContent.result`. When the result can carry MCP metadata, Caplets adds
@@ -1200,7 +1200,7 @@ relative to the downstream MCP server process, not necessarily relative to the c
 project or Caplets process.
 
 For first use, the explicit progressive-discovery path is still safest: choose a Caplet,
-`search_tools` or `list_tools`, inspect uncertain tools with `get_tool`, then `call_tool`.
+`search_tools` or `tools`, inspect uncertain tools with `describe_tool`, then `call_tool`.
 
 ## Development
 

diff --git a/apps/landing/src/pages/index.astro b/apps/landing/src/pages/index.astro
@@ -12,15 +12,15 @@ const heroTrace = {
     {
       label: "inspect",
       detail: "Show one capability card before any downstream tool list enters context.",
-      result: "search_tools · get_tool · call_tool",
+      result: "search_tools · describe_tool · call_tool",
     },
     {
       label: 'search_tools("pull request")',
       detail: "Find matching operations inside the selected capability only.",
       result: "create_pull_request · list_pull_requests · request_review",
     },
     {
-      label: 'get_tool("create_pull_request")',
+      label: 'describe_tool("create_pull_request")',
       detail: "Inspect the exact schema before an agent can invoke the operation.",
       result: "title · body · base · head · reviewers?",
     },
@@ -165,7 +165,7 @@ const exampleCaplets = [
     name: "GitHub",
     summary: "A huge hosted MCP surface for repositories, issues, pull requests, branches, commits, and reviews.",
     why: "Use it when the value is avoiding a giant GitHub tool wall.",
-    path: ["github", "inspect", "search_tools", "get_tool", "call_tool"],
+    path: ["github", "inspect", "search_tools", "describe_tool", "call_tool"],
     steps: [
       { command: "export GH_TOKEN=github_pat_...", label: "GitHub token export" },
       {
@@ -181,7 +181,7 @@ const exampleCaplets = [
     name: "Sourcegraph",
     summary: "Hosted code search for finding examples, references, and implementation patterns across repositories.",
     why: "Use it when the agent should search code first, then inspect only the matching operations.",
-    path: ["sourcegraph", "inspect", "search_tools", "get_tool", "call_tool"],
+    path: ["sourcegraph", "inspect", "search_tools", "describe_tool", "call_tool"],
     steps: [
       {
         command: "caplets install spiritledsoftware/caplets sourcegraph",
@@ -197,7 +197,7 @@ const exampleCaplets = [
     name: "OSV",
     summary: "A small explicit HTTP API for vulnerability lookups by package, purl, commit, or batch query.",
     why: "Use it when Caplets should bound a sharp task without exposing arbitrary HTTP calls.",
-    path: ["osv", "inspect", "search_tools", "get_tool", "call_tool"],
+    path: ["osv", "inspect", "search_tools", "describe_tool", "call_tool"],
     steps: [
       {
         command: "caplets install spiritledsoftware/caplets osv",

diff --git a/docs/benchmarks/coding-agent.md b/docs/benchmarks/coding-agent.md
@@ -7,20 +7,20 @@ This report is generated by `pnpm --filter @caplets/benchmarks benchmark` from d
 The deterministic benchmark compares two ways of exposing the same three mock MCP servers to a coding agent:
 
 - Direct flat MCP aggregation exposes every downstream tool from the `policy`, `tickets`, `api` servers in the initial `tools/list` payload.
-- Caplets progressive disclosure exposes one top-level capability tool per server, then keeps downstream tools behind scoped `inspect`, `list_tools` or `search_tools`, `get_tool`, and `call_tool` operations.
+- Caplets progressive disclosure exposes one top-level capability tool per server, then keeps downstream tools behind scoped `inspect`, `tools` or `search_tools`, `describe_tool`, and `call_tool` operations.
 
 The fixture uses local mock MCP metadata only. It does not call external APIs, depend on network access, or require model credentials. Approximate token counts use `Math.ceil(bytes / 4)` as a stable context-size proxy, not provider billing data.
 
 ## Summary
 
 - Initial tools visible: direct flat MCP 106, Caplets top-level 3, 97.2% fewer.
-- Serialized payload bytes: direct flat MCP 32090, Caplets top-level 3879, 87.9% fewer.
-- Approx. tokens: direct flat MCP 8023, Caplets top-level 970, 7053 fewer.
+- Serialized payload bytes: direct flat MCP 32090, Caplets top-level 5082, 84.2% fewer.
+- Approx. tokens: direct flat MCP 8023, Caplets top-level 1271, 6752 fewer.
 - Candidate set before discovery: direct flat MCP 106, Caplets top-level 3, 103 fewer.
 
 ## Deterministic Results
 
-Caplets reduces the initial serialized MCP tool payload by 87.9%, from 32090 bytes to 3879 bytes. It reduces initially visible tools by 97.2%, from 106 direct flat tools to 3 Caplets capability tools, while preserving access to downstream tools through scoped discovery and `call_tool`.
+Caplets reduces the initial serialized MCP tool payload by 84.2%, from 32090 bytes to 5082 bytes. It reduces initially visible tools by 97.2%, from 106 direct flat tools to 3 Caplets capability tools, while preserving access to downstream tools through scoped discovery and `call_tool`.
 
 ## Collision Check
 
@@ -34,15 +34,38 @@ Caplets top-level duplicate tool-name collisions: 0
 
 Direct flat MCP exposes all downstream tools immediately, so expected discovery calls are 0 but the initial candidate set is 106 tools.
 
-Caplets starts from 3 capability tools. Expected task-specific discovery is 4 calls: `inspect`, `list_tools` or `search_tools`, `get_tool`, then `call_tool`.
+Caplets starts from 3 capability tools. Expected task-specific discovery is 4 calls: `inspect`, `tools` or `search_tools`, `describe_tool`, then `call_tool`.
 
 ## Validation
 
-- Initial payload reduction threshold: 87.9% >= 70.0%
+- Initial payload reduction threshold: 84.2% >= 70.0%
 - Top-level Caplets collisions: 0
 
 Payload implementation: `source`
 
+## Code Mode Workflow Eval
+
+The deterministic Code Mode fixture covers 12 PRD task categories and shows 80.5% fewer model/tool round trips versus equivalent progressive-disclosure sequences, with 50.7% lower approximate context tokens.
+
+### Complex Workflow Eval
+
+Task: Discover GitHub issue/PR tools, inspect schemas or observed shapes, fetch open work, preserve labels and URLs, and synthesize a next-action triage brief.
+
+| Strategy               | External calls | LLM round trips | Code Mode calls | Internal Caplet calls | Approx. payload tokens | Success score |
+| ---------------------- | -------------: | --------------: | --------------: | --------------------: | ---------------------: | ------------: |
+| Vanilla MCP            |              4 |               4 |               0 |                     0 |                   4200 |          0.72 |
+| Progressive disclosure |             13 |              13 |               0 |                     0 |                   8600 |          0.95 |
+| Code Mode              |              1 |               1 |               1 |                     7 |                   2300 |          0.93 |
+
+Code Mode preserves required triage fields (`number`, `title`, `state`, `url`, `html_url`, `labels`, `created_at`, `updated_at`) while reducing external calls versus progressive disclosure by 92.3% and approximate payload tokens by 73.3%.
+
+### Live Regression Guardrails
+
+The deterministic report also records live cold-agent failure classes without treating model-dependent runs as deterministic claims. Current guardrails: `code-mode-one-run-guidance`, `optional-use-avoid-hints`, `schema-error-call-signatures`, `transport-body-normalization`.
+
+- `github-issues-and-prs-adjacent-entities`: Cold agents can under-query adjacent entities or over-trust one search result when backend taxonomy is broad. Guardrails: `code-mode-one-run-guidance`, `optional-use-avoid-hints`.
+- `osv-package-version-tool-selection`: Code Mode initially chose a batch-style tool and leaked HTTP transport body shape before recovering. Guardrails: `code-mode-one-run-guidance`, `optional-use-avoid-hints`, `schema-error-call-signatures`, `transport-body-normalization`.
+
 ## Reproduce
 
 Run the deterministic benchmark and update this report: