From 1753aac4652e1b772daa29acbc932816ca74dd3d Mon Sep 17 00:00:00 2001 From: "global.eye.wu" Date: Thu, 14 May 2026 15:59:36 +0800 Subject: [PATCH 1/2] chore(ci): bump actions/checkout + actions/setup-node to @v5 GitHub annotated PR #6 with a deprecation warning: the @v4 versions of these actions run on Node.js 20 internally, which GitHub is phasing out as the runner default moves to Node.js 24. The @v5 versions are functionally identical for our usage but are built on the new runtime. No workflow behavior changes; cache and matrix config untouched. Co-Authored-By: Claude Opus 4.7 --- .github/workflows/ci.yml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 5d9e549..5621549 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -15,10 +15,10 @@ jobs: matrix: os: [ubuntu-latest, macos-latest, windows-latest] steps: - - uses: actions/checkout@v4 + - uses: actions/checkout@v5 - name: Setup Node.js - uses: actions/setup-node@v4 + uses: actions/setup-node@v5 with: node-version: 20 cache: npm From 4eaae2967d4fe55c63ce0c0b8940fe441a1b5886 Mon Sep 17 00:00:00 2001 From: "global.eye.wu" Date: Thu, 14 May 2026 16:08:26 +0800 Subject: [PATCH 2/2] feat(model): enable Anthropic prompt caching end-to-end (M1.5b) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The wiring built in M1.5a (cache token accumulators, cost factors, `myagent usage` CLI) now has something to count. This change toggles the request side from "no cache_control anywhere" to "two cache breakpoints: system + tools". Type extensions: - New `SystemTextBlock` in `model.ts`: `{type:"text", text, cache_control?}`. - `ModelRequest.system` now accepts `string | readonly SystemTextBlock[]`. - `ToolContext.system` and `QueryOptions.system` propagated the same way. - `ForkTrace.systemPrompt` accepts both forms and hashes their text content, so the fork-trace identity stays stable across the legacy flat-string and structured-array representations. Outbound request shape: - `buildAgentSystemPrompt` (in cli/src/index.ts) returns a single `SystemTextBlock` containing base prompt + memory + skill context, marked `cache_control: { type: "ephemeral" }`. Identical content across every turn of a session → cache hit on every turn after the first. - `toAnthropicTools` (in core/src/anthropic.ts) marks the *last* tool in the list with `cache_control: ephemeral`, turning the whole tool list into a single cache breakpoint. Tool definitions are stable across turns by construction, so the breakpoint reliably hits. - `toAnthropicTools` and `toModelUsage` are now exported so the security suite can unit-test them. Response parsing: - `toModelUsage` extracts `cache_creation_input_tokens` and `cache_read_input_tokens` from the SDK's `message_start.message.usage` and `message_delta.usage`. Both fields are optional; non-cached turns leave them `undefined`, which `addTokenUsage` already treats as zero. - `runAgentTurn` emits per-turn profile metrics `model.cache_creation_input_tokens` / `model.cache_read_input_tokens` and per-session counterparts `session.cache_creation_input_tokens` / `session.cache_read_input_tokens`. Tests added in `packages/core/test/security/prompt-caching.test.ts` (6 cases on `toAnthropicTools` + `toModelUsage`) and a CLI assertion that the agent's outbound `request.system` is the structured form with a `cache_control` marker. Catalog row added; CLAUDE.md updated. Two pre-existing cli tests captured `request.system` as a string; extracted a `systemToText` helper to flatten the array form during assertions. Also bundled the chore from PR #6's deprecation annotation: actions/checkout and actions/setup-node bumped from @v4 to @v5. Local: 161 tests, 3/3 runs green. Co-Authored-By: Claude Opus 4.7 --- CLAUDE.md | 2 +- packages/cli/src/index.ts | 45 +++++++++++- packages/cli/test/cli.test.ts | 52 ++++++++++++- packages/core/src/anthropic.ts | 57 +++++++++++---- packages/core/src/fork.ts | 15 +++- packages/core/src/model.ts | 19 ++++- packages/core/src/query.ts | 7 +- packages/core/src/types.ts | 4 +- packages/core/test/security/README.md | 10 +++ .../core/test/security/prompt-caching.test.ts | 73 +++++++++++++++++++ 10 files changed, 257 insertions(+), 27 deletions(-) create mode 100644 packages/core/test/security/prompt-caching.test.ts diff --git a/CLAUDE.md b/CLAUDE.md index f7f0507..1bdeda5 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -25,7 +25,7 @@ The CLI binary entry is `packages/cli/dist/index.js` (exposed as `myagent`). The ### Environment - `ANTHROPIC_API_KEY` — required for real model calls (`chat`, `agent`, `tui`). Read from process env or a local `.env` (parsed by `loadEnvironment` in `packages/cli/src/index.ts`; only an allow-listed set of keys is honored). -- `ANTHROPIC_BASE_URL`, `MYAGENT_MODEL`, `MYAGENT_PERMISSION_MODE`, `MYAGENT_INPUT_USD_PER_MTOK`, `MYAGENT_OUTPUT_USD_PER_MTOK`, `MYAGENT_CACHE_WRITE_USD_PER_MTOK`, `MYAGENT_CACHE_READ_USD_PER_MTOK` — optional overrides. The two cache-rate vars feed `estimateUsageCostUsd` and surface in `myagent usage ` once prompt caching is enabled in M1.5b. +- `ANTHROPIC_BASE_URL`, `MYAGENT_MODEL`, `MYAGENT_PERMISSION_MODE`, `MYAGENT_INPUT_USD_PER_MTOK`, `MYAGENT_OUTPUT_USD_PER_MTOK`, `MYAGENT_CACHE_WRITE_USD_PER_MTOK`, `MYAGENT_CACHE_READ_USD_PER_MTOK` — optional overrides. Prompt caching is wired on outbound requests: the agent's system prompt is sent as a single `SystemTextBlock` with `cache_control: ephemeral`, and the tool list's last entry carries a matching marker so the whole tool block is cached. `cacheCreationInputTokens` / `cacheReadInputTokens` flow back through `ModelUsage` → `TokenUsage` → session record → `myagent usage ` per-turn breakdown. - Offline tests use `FakeModel` and do not need an API key. - Runtime state (sessions, artifacts, profiles, tasks, fork traces, memory) is written under `.myagent/` in the cwd; gitignored. diff --git a/packages/cli/src/index.ts b/packages/cli/src/index.ts index e833973..09a24b4 100644 --- a/packages/cli/src/index.ts +++ b/packages/cli/src/index.ts @@ -46,6 +46,7 @@ import { type MemoryEntry, type SessionCompactionArchiver, type SessionEvent, + type SystemTextBlock, type ModelClient, type ModelStreamEvent, type PermissionDecision, @@ -1576,6 +1577,18 @@ async function runAgentTurn(options: RunAgentTurnOptions): Promise undefined); return { exitCode: 0, sessionId: bootstrap.sessionId }; @@ -1856,10 +1879,28 @@ function parseOptionalNumber(value: string | undefined): number | undefined { return Number.isFinite(parsed) && parsed >= 0 ? parsed : undefined; } -function buildAgentSystemPrompt(memoryContext: string, skillContext: string): string { - return [READ_ONLY_AGENT_SYSTEM_PROMPT, memoryContext.trim(), skillContext.trim()] +/** + * Returns the agent's system prompt as a structured block array so we + * can mark it as a prompt-cache breakpoint. The combined content is + * placed in a single text block with `cache_control: ephemeral`, which + * tells Anthropic to cache the entire system prompt: identical reuse + * across every turn of a session, since memory + skill snapshots are + * captured once at session start. + */ +function buildAgentSystemPrompt( + memoryContext: string, + skillContext: string +): readonly SystemTextBlock[] { + const combined = [READ_ONLY_AGENT_SYSTEM_PROMPT, memoryContext.trim(), skillContext.trim()] .filter((part) => part.length > 0) .join("\n\n"); + return [ + { + type: "text", + text: combined, + cache_control: { type: "ephemeral" } + } + ]; } const READ_ONLY_AGENT_SYSTEM_PROMPT = `You are myagent Week 18, a safety-first coding agent. diff --git a/packages/cli/test/cli.test.ts b/packages/cli/test/cli.test.ts index 38f1e99..f5be100 100644 --- a/packages/cli/test/cli.test.ts +++ b/packages/cli/test/cli.test.ts @@ -26,6 +26,12 @@ function captureWriter() { }; } +function systemToText(system: string | ReadonlyArray<{ type: "text"; text: string }> | undefined): string { + if (system === undefined) return ""; + if (typeof system === "string") return system; + return system.map((block) => block.text).join("\n\n"); +} + describe("myagent cli", () => { it("prints version without starting agent runtime", async () => { const stdout = captureWriter(); @@ -150,7 +156,7 @@ describe("myagent cli", () => { }; }, async *stream(request) { - systems.push(request.system ?? ""); + systems.push(systemToText(request.system)); yield { type: "assistant_message", message: { @@ -227,7 +233,7 @@ describe("myagent cli", () => { }; }, async *stream(request) { - systems.push(request.system ?? ""); + systems.push(systemToText(request.system)); yield { type: "assistant_message", message: { role: "assistant", content: "Use real DB integration fixtures." }, @@ -525,6 +531,48 @@ describe("myagent cli", () => { expect(record.events.at(-1)?.type).toBe("compact"); }); + it("sends the agent's system prompt as a structured block with cache_control", async () => { + const cwd = mkdtempSync(join(tmpdir(), "myagent-cli-cache-system-")); + let capturedSystem: unknown; + const stdout = captureWriter(); + const stderr = captureWriter(); + + const exitCode = await runCli(["agent", "summarize", "fixture"], stdout.writer, stderr.writer, { + cwd, + env: {}, + createModelClient: () => + ({ + async create() { + return { + message: { role: "assistant", content: "ok" }, + requestId: "req_cache" + }; + }, + async *stream(request) { + capturedSystem = request.system; + yield { + type: "assistant_message", + message: { role: "assistant", content: "fixture summary" }, + requestId: "req_cache" + }; + } + }) satisfies ModelClient + }); + + expect(exitCode).toBe(0); + expect(stderr.text()).toBe(""); + expect(Array.isArray(capturedSystem)).toBe(true); + const systemBlocks = capturedSystem as Array<{ + type: string; + text: string; + cache_control?: { type: string }; + }>; + expect(systemBlocks).toHaveLength(1); + expect(systemBlocks[0]?.type).toBe("text"); + expect(systemBlocks[0]?.text).toContain("safety-first coding agent"); + expect(systemBlocks[0]?.cache_control).toEqual({ type: "ephemeral" }); + }); + it("prints per-turn token + cost breakdown via myagent usage", async () => { const cwd = mkdtempSync(join(tmpdir(), "myagent-cli-usage-")); const sessionRootDir = join(cwd, ".myagent", "sessions"); diff --git a/packages/core/src/anthropic.ts b/packages/core/src/anthropic.ts index 3869bfa..114f9d4 100644 --- a/packages/core/src/anthropic.ts +++ b/packages/core/src/anthropic.ts @@ -161,8 +161,19 @@ export class AnthropicModelClient implements ModelClient { stop_reason?: string | null; partial_json?: string; }; - message?: { usage?: { input_tokens?: number; output_tokens?: number } }; - usage?: { output_tokens?: number }; + message?: { + usage?: { + input_tokens?: number; + output_tokens?: number; + cache_creation_input_tokens?: number; + cache_read_input_tokens?: number; + }; + }; + usage?: { + output_tokens?: number; + cache_creation_input_tokens?: number; + cache_read_input_tokens?: number; + }; }; if (typed.type === "message_start") { @@ -227,7 +238,11 @@ export class AnthropicModelClient implements ModelClient { stopReason = typed.delta?.stop_reason; usage = { ...usage, - outputTokens: typed.usage?.output_tokens ?? usage?.outputTokens + outputTokens: typed.usage?.output_tokens ?? usage?.outputTokens, + cacheCreationInputTokens: + typed.usage?.cache_creation_input_tokens ?? usage?.cacheCreationInputTokens, + cacheReadInputTokens: + typed.usage?.cache_read_input_tokens ?? usage?.cacheReadInputTokens }; } } @@ -403,28 +418,40 @@ function toInternalContent( return content; } -function toAnthropicTools( +export function toAnthropicTools( tools: readonly ModelToolDefinition[] | undefined ): { tools?: Array> } { if (!tools || tools.length === 0) { return {}; } - return { - tools: tools.map((tool) => ({ - name: tool.name, - description: tool.description, - input_schema: tool.inputSchema - })) - }; + const mapped: Array> = tools.map((tool) => ({ + name: tool.name, + description: tool.description, + input_schema: tool.inputSchema + })); + // Mark the last tool with cache_control so the entire tool list becomes + // a single prompt-cache breakpoint. Tools rarely change across turns, + // so this caches the largest stable input segment after the system + // prompt. The marker is harmless on uncached calls. + const lastIndex = mapped.length - 1; + mapped[lastIndex] = { ...mapped[lastIndex], cache_control: { type: "ephemeral" } }; + return { tools: mapped }; } function isRecord(value: unknown): value is Record { return typeof value === "object" && value !== null && !Array.isArray(value); } -function toModelUsage( - usage: { input_tokens?: number; output_tokens?: number } | undefined +export function toModelUsage( + usage: + | { + input_tokens?: number; + output_tokens?: number; + cache_creation_input_tokens?: number; + cache_read_input_tokens?: number; + } + | undefined ): ModelUsage | undefined { if (!usage) { return undefined; @@ -432,7 +459,9 @@ function toModelUsage( return { inputTokens: usage.input_tokens, - outputTokens: usage.output_tokens + outputTokens: usage.output_tokens, + cacheCreationInputTokens: usage.cache_creation_input_tokens, + cacheReadInputTokens: usage.cache_read_input_tokens }; } diff --git a/packages/core/src/fork.ts b/packages/core/src/fork.ts index f5a3f13..4abf3be 100644 --- a/packages/core/src/fork.ts +++ b/packages/core/src/fork.ts @@ -1,5 +1,6 @@ import { createHash } from "node:crypto"; +import type { SystemTextBlock } from "./model.js"; import type { Message, ToolDefinition } from "./types.js"; export type ForkTrace = { @@ -18,15 +19,25 @@ export type ForkTraceInput = { parentDepth: number; subagentType: string; model: string; - systemPrompt?: string; + systemPrompt?: string | readonly SystemTextBlock[]; tools: readonly ToolDefinition[]; prefixMessages: readonly Message[]; directive: string; previous?: ForkTrace; }; +function systemPromptToHashable(systemPrompt: ForkTraceInput["systemPrompt"]): string { + if (systemPrompt === undefined) { + return ""; + } + if (typeof systemPrompt === "string") { + return systemPrompt; + } + return systemPrompt.map((block) => block.text).join("\n\n"); +} + export function createForkTrace(input: ForkTraceInput): ForkTrace { - const systemPromptHash = sha256(input.systemPrompt ?? ""); + const systemPromptHash = sha256(systemPromptToHashable(input.systemPrompt)); const toolHash = hashToolDefinitions(input.tools); const prefixHash = hashMessages(input.prefixMessages); const directiveHash = sha256(input.directive); diff --git a/packages/core/src/model.ts b/packages/core/src/model.ts index a80c267..81aa916 100644 --- a/packages/core/src/model.ts +++ b/packages/core/src/model.ts @@ -23,11 +23,28 @@ export type ModelUsage = { cacheReadInputTokens?: number; }; +/** + * A single text block in a structured system prompt. The optional + * `cache_control` marker turns this block into an Anthropic prompt-cache + * breakpoint: the cumulative content up to and including this block is + * cached and reused across requests that share the same prefix. + */ +export type SystemTextBlock = { + type: "text"; + text: string; + cache_control?: { type: "ephemeral" }; +}; + export type ModelRequest = { messages: readonly Message[]; model?: string; maxTokens?: number; - system?: string; + /** + * The system prompt. A plain string preserves the legacy flat form + * (no caching). An array of `SystemTextBlock`s enables structured + * caching when at least one block carries `cache_control`. + */ + system?: string | readonly SystemTextBlock[]; requestId?: string; timeoutMs?: number; signal?: AbortSignal; diff --git a/packages/core/src/query.ts b/packages/core/src/query.ts index fde8b6b..b09fefa 100644 --- a/packages/core/src/query.ts +++ b/packages/core/src/query.ts @@ -10,7 +10,8 @@ import { type ModelClient, type ModelErrorKind, type ModelStreamEvent, - type ModelUsage + type ModelUsage, + type SystemTextBlock } from "./model.js"; import { executeToolBatch, partitionToolCalls } from "./scheduler.js"; import { toModelToolDefinition } from "./tool.js"; @@ -32,7 +33,7 @@ export type QueryOptions = { initialMessages: readonly Message[]; tools: readonly ToolDefinition[]; toolContext: ToolContext; - system?: string; + system?: string | readonly SystemTextBlock[]; modelName?: string; maxTokens?: number; maxTurns?: number; @@ -270,7 +271,7 @@ type CollectModelTurnWithRetryOptions = { messages: readonly Message[]; modelName: string; maxTokens: number; - system?: string; + system?: string | readonly SystemTextBlock[]; signal?: AbortSignal; tools: readonly ModelToolDefinition[]; contextBudgetTokens: number; diff --git a/packages/core/src/types.ts b/packages/core/src/types.ts index e1af982..4dde22b 100644 --- a/packages/core/src/types.ts +++ b/packages/core/src/types.ts @@ -1,5 +1,5 @@ import type { z } from "zod"; -import type { ModelClient, ModelUsage } from "./model.js"; +import type { ModelClient, ModelUsage, SystemTextBlock } from "./model.js"; import type { ForkTrace } from "./fork.js"; import type { ProfileRecorder } from "./profile.js"; import type { TaskStore } from "./task.js"; @@ -78,7 +78,7 @@ export type ToolContext = { model?: ModelClient; modelName?: string; maxTokens?: number; - system?: string; + system?: string | readonly SystemTextBlock[]; parentMessages?: readonly Message[]; tools?: readonly ToolDefinition[]; taskStore?: TaskStore; diff --git a/packages/core/test/security/README.md b/packages/core/test/security/README.md index d2bb431..3e46c18 100644 --- a/packages/core/test/security/README.md +++ b/packages/core/test/security/README.md @@ -73,6 +73,16 @@ Tests live in two trees because of the package boundary | `executeToolBatch` never overlaps two non-concurrency-safe tools | `packages/core/test/security/scheduler-write-serialization.test.ts` | | Sibling read tools cancel when a Bash sibling errors with cancel-on-error | `packages/core/test/scheduler.test.ts` | +### Prompt caching plumbing + +| Invariant | Test | +|---|---| +| `toAnthropicTools` marks the *last* tool with `cache_control: { type: "ephemeral" }` so the full tool list becomes a single cache breakpoint | `packages/core/test/security/prompt-caching.test.ts` | +| `toAnthropicTools` returns `{}` (no tools, no spurious cache marker) on an empty/undefined input | `packages/core/test/security/prompt-caching.test.ts` | +| `toModelUsage` extracts `cache_creation_input_tokens` + `cache_read_input_tokens` when the SDK provides them | `packages/core/test/security/prompt-caching.test.ts` | +| `toModelUsage` leaves cache fields `undefined` on non-cached turns (the SDK omits them) | `packages/core/test/security/prompt-caching.test.ts` | +| The agent's outbound `request.system` is a `SystemTextBlock[]` (not a string) with `cache_control: ephemeral` on the block | `packages/cli/test/cli.test.ts` | + ### Cache token accounting | Invariant | Test | diff --git a/packages/core/test/security/prompt-caching.test.ts b/packages/core/test/security/prompt-caching.test.ts new file mode 100644 index 0000000..ba85a78 --- /dev/null +++ b/packages/core/test/security/prompt-caching.test.ts @@ -0,0 +1,73 @@ +import { describe, expect, it } from "vitest"; +import { + toAnthropicTools, + toModelUsage, + type ModelToolDefinition +} from "../../src/index.js"; + +function makeTool(name: string): ModelToolDefinition { + return { + name, + description: `${name} fixture`, + inputSchema: { type: "object", properties: {}, additionalProperties: false } + }; +} + +describe("security: prompt caching wiring", () => { + it("toAnthropicTools marks the last tool with cache_control ephemeral", () => { + const { tools } = toAnthropicTools([makeTool("Read"), makeTool("Glob"), makeTool("Edit")]); + expect(tools).toBeDefined(); + expect(tools).toHaveLength(3); + // First two are plain. + expect(tools![0]).not.toHaveProperty("cache_control"); + expect(tools![1]).not.toHaveProperty("cache_control"); + // Last one carries the cache breakpoint. + expect(tools![2]).toMatchObject({ + name: "Edit", + cache_control: { type: "ephemeral" } + }); + }); + + it("toAnthropicTools handles a single-tool list (the only tool is the cache breakpoint)", () => { + const { tools } = toAnthropicTools([makeTool("Read")]); + expect(tools).toHaveLength(1); + expect(tools![0]).toMatchObject({ + name: "Read", + cache_control: { type: "ephemeral" } + }); + }); + + it("toAnthropicTools returns no tools object for an empty list (no spurious cache marker)", () => { + expect(toAnthropicTools([])).toEqual({}); + expect(toAnthropicTools(undefined)).toEqual({}); + }); + + it("toModelUsage extracts cache_creation_input_tokens and cache_read_input_tokens", () => { + const usage = toModelUsage({ + input_tokens: 10, + output_tokens: 5, + cache_creation_input_tokens: 100, + cache_read_input_tokens: 200 + }); + expect(usage).toEqual({ + inputTokens: 10, + outputTokens: 5, + cacheCreationInputTokens: 100, + cacheReadInputTokens: 200 + }); + }); + + it("toModelUsage leaves cache fields undefined when the SDK omits them (non-cached turns)", () => { + const usage = toModelUsage({ input_tokens: 10, output_tokens: 5 }); + expect(usage).toEqual({ + inputTokens: 10, + outputTokens: 5, + cacheCreationInputTokens: undefined, + cacheReadInputTokens: undefined + }); + }); + + it("toModelUsage returns undefined when the SDK provides no usage block", () => { + expect(toModelUsage(undefined)).toBeUndefined(); + }); +});