feat(agent): emit per-category context token breakdown#2352
Conversation
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
23ff881 to
d6105bb
Compare
7032aef to
78a3903
Compare
d6105bb to
1589fab
Compare
78a3903 to
00f8802
Compare
Adds a `breakdown` field to the `_posthog/usage_update` ext-notification so the renderer can render per-source token splits in the upcoming context-breakdown popover. The agent estimates token counts for the stable pieces of the context (system prompt today; tools / MCP / rules / skills / subagents will follow as the agent gets at-rest access to their definitions) via a character-ratio heuristic. The `conversation` bucket is derived as `max(0, currentInputTokens - sum(stable))`, so the categories always sum to the input total. The renderer's `useContextUsage` now scans backwards for both the existing `session/update` aggregate and the new ext-notification's breakdown, surfacing the latter as a nullable field. Existing callers keep the aggregate; B4 wires the breakdown into the popover. Generated-By: PostHog Code Task-Id: bac06178-1ab1-4000-9a56-1901215bd4af Generated-By: PostHog Code Task-Id: bac06178-1ab1-4000-9a56-1901215bd4af
1589fab to
9ee50f4
Compare
00f8802 to
80a75e8
Compare
Prompt To Fix All With AIFix the following 3 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 3
packages/agent/src/adapters/claude/types.ts:67-71
**Baseline fields for `tools`, `rules`, `skills`, `mcp`, and `subagents` are never populated**
The JSDoc says "Refreshed at session init and on MCP/skill changes," but the initialization in `claude-agent.ts` spreads `emptyBaseline()` (all zeros) and only sets `systemPrompt`. The five other baseline fields stay at 0 for the entire session lifetime. Every `buildBreakdown` call will therefore attribute all non-system-prompt tokens to `conversation`, making the chart misleading for any session that has tools, rules, or MCP servers configured. Is the per-source population of those fields intentionally deferred to a follow-up PR, or is this an oversight?
### Issue 2 of 3
packages/agent/src/adapters/claude/context-breakdown.test.ts:17-20
The two assertions here are separate data points for the same behaviour and should use `it.each` per the team's preference for parameterised tests. The same applies to the "returns 0 for empty input" block above.
```suggestion
it.each([
[35, 10],
[350, 100],
])("scales roughly with input length (%i chars → %i tokens)", (chars, expected) => {
expect(estimateTokens("a".repeat(chars))).toBe(expected);
});
```
### Issue 3 of 3
packages/agent/src/adapters/claude/context-breakdown.test.ts:11-15
Three independent null/empty inputs tested in one block — a good candidate for `it.each` to give each input its own test name.
```suggestion
it.each([["", 0], [undefined, 0], [null, 0]] as const)(
"returns 0 for %s",
(input, expected) => {
expect(estimateTokens(input)).toBe(expected);
},
);
```
Reviews (1): Last reviewed commit: "feat(agent): emit per-category context t..." | Re-trigger Greptile |
| nextPendingOrder: number; | ||
| emitRawSDKMessages: boolean | SDKMessageFilter[]; | ||
| /** Per-source token estimates for stable pieces (system prompt, tools, etc.) | ||
| * used by the renderer's context-breakdown popover. Refreshed at session | ||
| * init and on MCP/skill changes. */ |
There was a problem hiding this comment.
Baseline fields for
tools, rules, skills, mcp, and subagents are never populated
The JSDoc says "Refreshed at session init and on MCP/skill changes," but the initialization in claude-agent.ts spreads emptyBaseline() (all zeros) and only sets systemPrompt. The five other baseline fields stay at 0 for the entire session lifetime. Every buildBreakdown call will therefore attribute all non-system-prompt tokens to conversation, making the chart misleading for any session that has tools, rules, or MCP servers configured. Is the per-source population of those fields intentionally deferred to a follow-up PR, or is this an oversight?
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/agent/src/adapters/claude/types.ts
Line: 67-71
Comment:
**Baseline fields for `tools`, `rules`, `skills`, `mcp`, and `subagents` are never populated**
The JSDoc says "Refreshed at session init and on MCP/skill changes," but the initialization in `claude-agent.ts` spreads `emptyBaseline()` (all zeros) and only sets `systemPrompt`. The five other baseline fields stay at 0 for the entire session lifetime. Every `buildBreakdown` call will therefore attribute all non-system-prompt tokens to `conversation`, making the chart misleading for any session that has tools, rules, or MCP servers configured. Is the per-source population of those fields intentionally deferred to a follow-up PR, or is this an oversight?
How can I resolve this? If you propose a fix, please make it concise.| it("scales roughly with input length", () => { | ||
| expect(estimateTokens("a".repeat(35))).toBe(10); | ||
| expect(estimateTokens("a".repeat(350))).toBe(100); | ||
| }); |
There was a problem hiding this comment.
The two assertions here are separate data points for the same behaviour and should use
it.each per the team's preference for parameterised tests. The same applies to the "returns 0 for empty input" block above.
| it("scales roughly with input length", () => { | |
| expect(estimateTokens("a".repeat(35))).toBe(10); | |
| expect(estimateTokens("a".repeat(350))).toBe(100); | |
| }); | |
| it.each([ | |
| [35, 10], | |
| [350, 100], | |
| ])("scales roughly with input length (%i chars → %i tokens)", (chars, expected) => { | |
| expect(estimateTokens("a".repeat(chars))).toBe(expected); | |
| }); |
Context Used: Do not attempt to comment on incorrect alphabetica... (source)
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/agent/src/adapters/claude/context-breakdown.test.ts
Line: 17-20
Comment:
The two assertions here are separate data points for the same behaviour and should use `it.each` per the team's preference for parameterised tests. The same applies to the "returns 0 for empty input" block above.
```suggestion
it.each([
[35, 10],
[350, 100],
])("scales roughly with input length (%i chars → %i tokens)", (chars, expected) => {
expect(estimateTokens("a".repeat(chars))).toBe(expected);
});
```
**Context Used:** Do not attempt to comment on incorrect alphabetica... ([source](https://app.greptile.com/review/custom-context?memory=instruction-0))
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| it("returns 0 for empty input", () => { | ||
| expect(estimateTokens("")).toBe(0); | ||
| expect(estimateTokens(undefined)).toBe(0); | ||
| expect(estimateTokens(null)).toBe(0); | ||
| }); |
There was a problem hiding this comment.
Three independent null/empty inputs tested in one block — a good candidate for
it.each to give each input its own test name.
| it("returns 0 for empty input", () => { | |
| expect(estimateTokens("")).toBe(0); | |
| expect(estimateTokens(undefined)).toBe(0); | |
| expect(estimateTokens(null)).toBe(0); | |
| }); | |
| it.each([["", 0], [undefined, 0], [null, 0]] as const)( | |
| "returns 0 for %s", | |
| (input, expected) => { | |
| expect(estimateTokens(input)).toBe(expected); | |
| }, | |
| ); |
Context Used: Do not attempt to comment on incorrect alphabetica... (source)
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/agent/src/adapters/claude/context-breakdown.test.ts
Line: 11-15
Comment:
Three independent null/empty inputs tested in one block — a good candidate for `it.each` to give each input its own test name.
```suggestion
it.each([["", 0], [undefined, 0], [null, 0]] as const)(
"returns 0 for %s",
(input, expected) => {
expect(estimateTokens(input)).toBe(expected);
},
);
```
**Context Used:** Do not attempt to comment on incorrect alphabetica... ([source](https://app.greptile.com/review/custom-context?memory=instruction-0))
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Problem
The context usage indicator in the renderer showed only aggregate token counts (used / size / percentage) with no breakdown of where those tokens were coming from. Users had no way to see how much of the context window was consumed by the system prompt, tools, rules, conversation history, etc.
Changes
Agent-side (
packages/agent):context-breakdown.tswith a lightweight character-ratio token estimator (~3.5 chars/token) and helpers:estimateTokens/estimateJsonTokensfor cheap client-side estimationestimateSystemPrompthandles raw strings,{ type: "preset", append }objects, andundefined, adding a constantCLAUDE_PRESET_ESTIMATE_TOKENS(4000) when the opaque Claude preset is in usebuildBreakdownderives theconversationbucket as whatever input tokens remain after subtracting the stable pieces (system prompt, tools, rules, skills, MCP, subagents), floored at 0 to absorb estimation driftemptyBaseline/ContextBreakdownBaselinetypes for initializing and carrying per-source estimates across turnscontextBreakdownBaselineto theSessiontype, initialized at session start with the estimated system-prompt token count_posthog/usage_updatenotification after each model turn, using the result's own input token categories rather than the streamed delta to handle subagent turns correctlyRenderer-side (
apps/code):ContextUsagewith abreakdown: ContextBreakdown | nullfield and added theContextBreakdowninterface (mirroring the agent shape, kept local to avoid a cross-package dependency)extractContextUsageto scan events in a single reverse pass, independently finding the latest aggregate (session/update) and the latest breakdown (_posthog/usage_updateor__posthog/usage_update), then merging themHow did you test this?
context-breakdown.tscoveringestimateTokens,estimateJsonTokens,estimateSystemPrompt, andbuildBreakdown(edge cases: empty input, circular JSON, conversation floored at 0, preset vs. raw string)extractContextUsagecovering: no events, aggregate-only, merged breakdown, and the double-underscore method prefix variantPublish to changelog?
no