Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .changeset/native-opencode-code-mode.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"@caplets/opencode": patch
---

Package native Caplets Code Mode assets for OpenCode and cover `caplets_run` registration.
5 changes: 5 additions & 0 deletions .changeset/small-paws-draw.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"@caplets/core": patch
---

Fix cloud attach URL normalization
3 changes: 2 additions & 1 deletion AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

- Use `pnpm` only; the repo pins `pnpm@11.5.0` and requires Node `>=24`.
- Install with `pnpm install --frozen-lockfile` when matching CI.
- Full local gate and pre-push hook: `pnpm verify` (`format:check -> lint -> typecheck -> schema:check -> test -> benchmark:check -> build`).
- Full local gate and pre-push hook: `pnpm verify` (`format:check -> lint -> code-mode:check-api -> typecheck -> schema:check -> test -> benchmark:check -> build`).
- Fast focused checks: `pnpm format:check`, `pnpm lint`, `pnpm typecheck`, `pnpm test`, `pnpm build`.
- Run one package: `pnpm --filter @caplets/core test`, `pnpm --filter caplets build`, or replace the filter with `@caplets/opencode`, `@caplets/pi`, `@caplets/benchmarks`.
- Run one Vitest file by passing it after the package script, e.g. `pnpm --filter @caplets/core test -- test/config.test.ts`.
Expand All @@ -21,6 +21,7 @@

- Put design specs in `docs/specs/`, implementation plans in `docs/plans/`, and product requirements documents in `docs/product/`; do not use `docs/superpowers/` in this repo.
- Config schema source of truth is Zod in `packages/core/src/config.ts`; update `schemas/caplets-config.schema.json` with `pnpm schema:generate` and verify with `pnpm schema:check`.
- Code Mode runtime API declaration source of truth is `packages/core/src/code-mode/runtime-api.d.ts`; update `packages/core/src/code-mode/runtime-api.generated.ts` with `pnpm code-mode:generate-api` and verify with `pnpm code-mode:check-api`.
- `pnpm benchmark` updates `docs/benchmarks/coding-agent.md`; `pnpm benchmark:check` fails if the committed report is stale.
- Live benchmarks are opt-in only: build first, then run `CAPLETS_BENCH_LIVE=1 pnpm benchmark:live:opencode` or `CAPLETS_BENCH_LIVE=1 pnpm benchmark:live:pi`; results are local/model-dependent and not deterministic product claims.

Expand Down
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
# caplets

## Unreleased

### Major Changes

- Breaking: Caplet progressive wrapper operation names now use `check`, `tools`, `describe_tool`, resource/prompt operation names without `list_`, and `name`/`args` fields instead of `tool`/`prompt`/`arguments`. Code Mode declarations now expose comprehensive Caplet handles with paginated discovery, result envelopes, resource/prompt methods, loose TypeScript diagnostics, and schema-derived `callSignature`.

## 0.12.0

### Minor Changes
Expand Down
40 changes: 20 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@

Caplets turns MCP servers, APIs, and commands into focused agent capabilities: one card first, searchable tools next, inspectable schemas before calls, and preserved results after.

Stop dumping every operation into context up front. Caplets wraps each tool source as a capability an agent can discover, inspect, call, and recover from one step at a time. Instead of exposing a giant flat wall of operations, Caplets shows a compact capability card with source, status, and next actions. The agent chooses a domain first, then uses scoped operations like `search_tools`, `get_tool`, and `call_tool` only when it needs more detail.
Stop dumping every operation into context up front. Caplets wraps each tool source as a capability an agent can discover, inspect, call, and recover from one step at a time. Instead of exposing a giant flat wall of operations, Caplets shows a compact capability card with source, status, and next actions. The agent chooses a domain first, then uses scoped operations like `search_tools`, `describe_tool`, and `call_tool` only when it needs more detail.

For MCP-backed Caplets, the scoped operation set also includes resource discovery and reading, prompt listing and rendering, resource-template discovery, and completion for prompt or template arguments. Non-MCP backends expose focused tool and action operations.

Expand All @@ -43,7 +43,7 @@ caplets add mcp context7 --command npx --arg -y --arg @upstash/context7-mcp
caplets serve
```

In the deterministic benchmark, 106 flat tools became 3 top-level capabilities with an 87.9% smaller initial payload. Your agent starts with `context7`, then drills in through `inspect`, `search_tools`, `get_tool`, and `call_tool` only when needed.
In the deterministic benchmark, 106 flat tools became 3 top-level capabilities with an 87.9% smaller initial payload. Your agent starts with `context7`, then drills in through `inspect`, `search_tools`, `describe_tool`, and `call_tool` only when needed.

## Quick Start

Expand Down Expand Up @@ -312,8 +312,8 @@ Flat tool lists make agents guess before they understand. If every downstream se
Caplets turns that flat wall into a staged path:

1. **Choose** a capability, such as `GitHub`.
2. **Inspect** matching operations with `search_tools` or `list_tools`.
3. **Resolve** the exact schema with `get_tool`.
2. **Inspect** matching operations with `search_tools` or `tools`.
3. **Resolve** the exact schema with `describe_tool`.
4. **Invoke** with `call_tool` while preserving downstream content, structured data, and error state.

A backend enters agent context as a focused card with source, status, and next actions, not a wall of operations.
Expand Down Expand Up @@ -390,7 +390,7 @@ If a backend fails, Caplets keeps the error scoped to the capability, preserves
- Uses the configured `name` and `description` as the capability card shown to agents.
- Starts downstream MCP servers and loads OpenAPI specs lazily when an operation needs them.
- Supports stdio, Streamable HTTP, and legacy HTTP+SSE downstream servers.
- Lets agents `list_tools`, `search_tools`, `get_tool`, and `call_tool` within one selected Caplet namespace.
- Lets agents `tools`, `search_tools`, `describe_tool`, and `call_tool` within one selected Caplet namespace.
- Converts OpenAPI operations into MCP-style tool metadata and executes HTTP calls directly.
- Converts configured GraphQL operations into MCP-style tool metadata, and can auto-generate GraphQL tools from schema root query and mutation fields.
- Converts explicitly configured HTTP actions into MCP-style tool metadata and executes HTTP calls directly.
Expand Down Expand Up @@ -780,7 +780,7 @@ OpenAPI auth is explicit and supports:
- `{"type": "oauth2", ...}`
- `{"type": "oidc", ...}`

OpenAPI `call_tool.arguments` uses grouped HTTP inputs:
OpenAPI `call_tool.args` uses grouped HTTP inputs:

```json
{
Expand Down Expand Up @@ -824,7 +824,7 @@ endpoint and exactly one schema source: `schemaPath`, `schemaUrl`, or `introspec

When `operations` is omitted or empty, Caplets auto-generates tools from schema root
fields: `query_<field>` and `mutation_<field>`. Generated tools use bounded scalar
selection sets and pass `call_tool.arguments` directly as GraphQL variables/root-field
selection sets and pass `call_tool.args` directly as GraphQL variables/root-field
arguments.

Every GraphQL endpoint can set:
Expand Down Expand Up @@ -878,7 +878,7 @@ must start with `/` and be URL paths that cannot change origin or escape the bas
Action mappings can set `query`, `headers`, and `jsonBody`. `query` and `headers` must resolve
to object maps whose values are strings, numbers, or booleans. `jsonBody` may use literals,
nested arrays/objects, `$input.field` references, or `$input` for the whole argument object.
Path placeholders such as `{service}` are read directly from `call_tool.arguments` and URL-encoded.
Path placeholders such as `{service}` are read directly from `call_tool.args` and URL-encoded.
Configured action headers cannot set managed headers such as `authorization`, `host`,
`content-length`, `connection`, or `content-type`; JSON bodies set `content-type` automatically.

Expand Down Expand Up @@ -939,8 +939,8 @@ an existing destination file.
### Caplet Sets

Use `capletSets` to expose another Caplets collection as nested Caplets. Each child Caplet appears
as one downstream tool and supports the full Caplets operation set: `inspect`, `check_backend`,
`list_tools`, `search_tools`, `get_tool`, and `call_tool`.
as one downstream tool and supports the full Caplets operation set: `inspect`, `check`,
`tools`, `search_tools`, `describe_tool`, and `call_tool`.

```json
{
Expand Down Expand Up @@ -1135,7 +1135,7 @@ Each generated Caplet tool accepts an `operation`:

```json
{
"operation": "list_tools"
"operation": "tools"
}
```

Expand All @@ -1153,7 +1153,7 @@ Inspect one exact downstream tool:

```json
{
"operation": "get_tool",
"operation": "describe_tool",
"tool": "read_file"
}
```
Expand All @@ -1173,23 +1173,23 @@ Call one exact downstream tool:
Available operations:

- `inspect`: return the configured capability card without starting the downstream server.
- `check_backend`: verify the selected backend, whether MCP, OpenAPI, GraphQL, HTTP, CLI, or nested Caplets.
- `list_tools`: return compact downstream tool metadata.
- `check`: verify the selected backend, whether MCP, OpenAPI, GraphQL, HTTP, CLI, or nested Caplets.
- `tools`: return compact downstream tool metadata.
- `search_tools`: search downstream tool names and descriptions within this Caplet.
- `get_tool`: return full metadata for one exact downstream tool.
- `describe_tool`: return full metadata for one exact downstream tool.
- `call_tool`: invoke one exact downstream tool with JSON object arguments.

Requests are strict: operation-specific extra fields are rejected, and `call_tool` requires
`arguments` to be a JSON object.

Discovery operations (`inspect`, `check_backend`, `list_tools`, `search_tools`, and
`get_tool`) return wrapper-generated results whose `structuredContent.caplets` field
Discovery operations (`inspect`, `check`, `tools`, `search_tools`, and
`describe_tool`) return wrapper-generated results whose `structuredContent.caplets` field
identifies the Caplet with `id`, plus backend, operation, status, and elapsed time when
available. Discovery result objects and compact tool entries also use `id` for the
configured Caplet identity. Compact `list_tools` and `search_tools` entries may include
configured Caplet identity. Compact `tools` and `search_tools` entries may include
input/output schema hashes; treat those
hashes as reuse hints for a schema you have already inspected, not as a replacement for
`get_tool` when arguments, output, or semantics are unclear.
`describe_tool` when arguments, output, or semantics are unclear.

Direct `call_tool` preserves the downstream tool result shape instead of wrapping it in
`structuredContent.result`. When the result can carry MCP metadata, Caplets adds
Expand All @@ -1200,7 +1200,7 @@ relative to the downstream MCP server process, not necessarily relative to the c
project or Caplets process.

For first use, the explicit progressive-discovery path is still safest: choose a Caplet,
`search_tools` or `list_tools`, inspect uncertain tools with `get_tool`, then `call_tool`.
`search_tools` or `tools`, inspect uncertain tools with `describe_tool`, then `call_tool`.

## Development

Expand Down
10 changes: 5 additions & 5 deletions apps/landing/src/pages/index.astro
Original file line number Diff line number Diff line change
Expand Up @@ -12,15 +12,15 @@ const heroTrace = {
{
label: "inspect",
detail: "Show one capability card before any downstream tool list enters context.",
result: "search_tools · get_tool · call_tool",
result: "search_tools · describe_tool · call_tool",
},
{
label: 'search_tools("pull request")',
detail: "Find matching operations inside the selected capability only.",
result: "create_pull_request · list_pull_requests · request_review",
},
{
label: 'get_tool("create_pull_request")',
label: 'describe_tool("create_pull_request")',
detail: "Inspect the exact schema before an agent can invoke the operation.",
result: "title · body · base · head · reviewers?",
},
Expand Down Expand Up @@ -165,7 +165,7 @@ const exampleCaplets = [
name: "GitHub",
summary: "A huge hosted MCP surface for repositories, issues, pull requests, branches, commits, and reviews.",
why: "Use it when the value is avoiding a giant GitHub tool wall.",
path: ["github", "inspect", "search_tools", "get_tool", "call_tool"],
path: ["github", "inspect", "search_tools", "describe_tool", "call_tool"],
steps: [
{ command: "export GH_TOKEN=github_pat_...", label: "GitHub token export" },
{
Expand All @@ -181,7 +181,7 @@ const exampleCaplets = [
name: "Sourcegraph",
summary: "Hosted code search for finding examples, references, and implementation patterns across repositories.",
why: "Use it when the agent should search code first, then inspect only the matching operations.",
path: ["sourcegraph", "inspect", "search_tools", "get_tool", "call_tool"],
path: ["sourcegraph", "inspect", "search_tools", "describe_tool", "call_tool"],
steps: [
{
command: "caplets install spiritledsoftware/caplets sourcegraph",
Expand All @@ -197,7 +197,7 @@ const exampleCaplets = [
name: "OSV",
summary: "A small explicit HTTP API for vulnerability lookups by package, purl, commit, or batch query.",
why: "Use it when Caplets should bound a sharp task without exposing arbitrary HTTP calls.",
path: ["osv", "inspect", "search_tools", "get_tool", "call_tool"],
path: ["osv", "inspect", "search_tools", "describe_tool", "call_tool"],
steps: [
{
command: "caplets install spiritledsoftware/caplets osv",
Expand Down
35 changes: 29 additions & 6 deletions docs/benchmarks/coding-agent.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,20 +7,20 @@ This report is generated by `pnpm --filter @caplets/benchmarks benchmark` from d
The deterministic benchmark compares two ways of exposing the same three mock MCP servers to a coding agent:

- Direct flat MCP aggregation exposes every downstream tool from the `policy`, `tickets`, `api` servers in the initial `tools/list` payload.
- Caplets progressive disclosure exposes one top-level capability tool per server, then keeps downstream tools behind scoped `inspect`, `list_tools` or `search_tools`, `get_tool`, and `call_tool` operations.
- Caplets progressive disclosure exposes one top-level capability tool per server, then keeps downstream tools behind scoped `inspect`, `tools` or `search_tools`, `describe_tool`, and `call_tool` operations.

The fixture uses local mock MCP metadata only. It does not call external APIs, depend on network access, or require model credentials. Approximate token counts use `Math.ceil(bytes / 4)` as a stable context-size proxy, not provider billing data.

## Summary

- Initial tools visible: direct flat MCP 106, Caplets top-level 3, 97.2% fewer.
- Serialized payload bytes: direct flat MCP 32090, Caplets top-level 3879, 87.9% fewer.
- Approx. tokens: direct flat MCP 8023, Caplets top-level 970, 7053 fewer.
- Serialized payload bytes: direct flat MCP 32090, Caplets top-level 5082, 84.2% fewer.
- Approx. tokens: direct flat MCP 8023, Caplets top-level 1271, 6752 fewer.
- Candidate set before discovery: direct flat MCP 106, Caplets top-level 3, 103 fewer.

## Deterministic Results

Caplets reduces the initial serialized MCP tool payload by 87.9%, from 32090 bytes to 3879 bytes. It reduces initially visible tools by 97.2%, from 106 direct flat tools to 3 Caplets capability tools, while preserving access to downstream tools through scoped discovery and `call_tool`.
Caplets reduces the initial serialized MCP tool payload by 84.2%, from 32090 bytes to 5082 bytes. It reduces initially visible tools by 97.2%, from 106 direct flat tools to 3 Caplets capability tools, while preserving access to downstream tools through scoped discovery and `call_tool`.

## Collision Check

Expand All @@ -34,15 +34,38 @@ Caplets top-level duplicate tool-name collisions: 0

Direct flat MCP exposes all downstream tools immediately, so expected discovery calls are 0 but the initial candidate set is 106 tools.

Caplets starts from 3 capability tools. Expected task-specific discovery is 4 calls: `inspect`, `list_tools` or `search_tools`, `get_tool`, then `call_tool`.
Caplets starts from 3 capability tools. Expected task-specific discovery is 4 calls: `inspect`, `tools` or `search_tools`, `describe_tool`, then `call_tool`.

## Validation

- Initial payload reduction threshold: 87.9% >= 70.0%
- Initial payload reduction threshold: 84.2% >= 70.0%
- Top-level Caplets collisions: 0

Payload implementation: `source`

## Code Mode Workflow Eval

The deterministic Code Mode fixture covers 12 PRD task categories and shows 80.5% fewer model/tool round trips versus equivalent progressive-disclosure sequences, with 50.7% lower approximate context tokens.

### Complex Workflow Eval

Task: Discover GitHub issue/PR tools, inspect schemas or observed shapes, fetch open work, preserve labels and URLs, and synthesize a next-action triage brief.

| Strategy | External calls | LLM round trips | Code Mode calls | Internal Caplet calls | Approx. payload tokens | Success score |
| ---------------------- | -------------: | --------------: | --------------: | --------------------: | ---------------------: | ------------: |
| Vanilla MCP | 4 | 4 | 0 | 0 | 4200 | 0.72 |
| Progressive disclosure | 13 | 13 | 0 | 0 | 8600 | 0.95 |
| Code Mode | 1 | 1 | 1 | 7 | 2300 | 0.93 |

Code Mode preserves required triage fields (`number`, `title`, `state`, `url`, `html_url`, `labels`, `created_at`, `updated_at`) while reducing external calls versus progressive disclosure by 92.3% and approximate payload tokens by 73.3%.

### Live Regression Guardrails

The deterministic report also records live cold-agent failure classes without treating model-dependent runs as deterministic claims. Current guardrails: `code-mode-one-run-guidance`, `optional-use-avoid-hints`, `schema-error-call-signatures`, `transport-body-normalization`.

- `github-issues-and-prs-adjacent-entities`: Cold agents can under-query adjacent entities or over-trust one search result when backend taxonomy is broad. Guardrails: `code-mode-one-run-guidance`, `optional-use-avoid-hints`.
- `osv-package-version-tool-selection`: Code Mode initially chose a batch-style tool and leaked HTTP transport body shape before recovering. Guardrails: `code-mode-one-run-guidance`, `optional-use-avoid-hints`, `schema-error-call-signatures`, `transport-body-normalization`.

## Reproduce

Run the deterministic benchmark and update this report:
Expand Down
Loading