Skip to content

feat(agent): emit per-category context token breakdown#2352

Open
k11kirky wants to merge 1 commit into
posthog-code/usage-threshold-monitorfrom
posthog-code/context-breakdown-data
Open

feat(agent): emit per-category context token breakdown#2352
k11kirky wants to merge 1 commit into
posthog-code/usage-threshold-monitorfrom
posthog-code/context-breakdown-data

Conversation

@k11kirky
Copy link
Copy Markdown
Contributor

@k11kirky k11kirky commented May 25, 2026

Problem

The context usage indicator in the renderer showed only aggregate token counts (used / size / percentage) with no breakdown of where those tokens were coming from. Users had no way to see how much of the context window was consumed by the system prompt, tools, rules, conversation history, etc.

Changes

Agent-side (packages/agent):

  • Added context-breakdown.ts with a lightweight character-ratio token estimator (~3.5 chars/token) and helpers:
    • estimateTokens / estimateJsonTokens for cheap client-side estimation
    • estimateSystemPrompt handles raw strings, { type: "preset", append } objects, and undefined, adding a constant CLAUDE_PRESET_ESTIMATE_TOKENS (4000) when the opaque Claude preset is in use
    • buildBreakdown derives the conversation bucket as whatever input tokens remain after subtracting the stable pieces (system prompt, tools, rules, skills, MCP, subagents), floored at 0 to absorb estimation drift
    • emptyBaseline / ContextBreakdownBaseline types for initializing and carrying per-source estimates across turns
  • Attached a contextBreakdownBaseline to the Session type, initialized at session start with the estimated system-prompt token count
  • Emits the breakdown alongside the existing _posthog/usage_update notification after each model turn, using the result's own input token categories rather than the streamed delta to handle subagent turns correctly

Renderer-side (apps/code):

  • Extended ContextUsage with a breakdown: ContextBreakdown | null field and added the ContextBreakdown interface (mirroring the agent shape, kept local to avoid a cross-package dependency)
  • Refactored extractContextUsage to scan events in a single reverse pass, independently finding the latest aggregate (session/update) and the latest breakdown (_posthog/usage_update or __posthog/usage_update), then merging them

How did you test this?

  • Added unit tests for context-breakdown.ts covering estimateTokens, estimateJsonTokens, estimateSystemPrompt, and buildBreakdown (edge cases: empty input, circular JSON, conversation floored at 0, preset vs. raw string)
  • Added unit tests for extractContextUsage covering: no events, aggregate-only, merged breakdown, and the double-underscore method prefix variant

Publish to changelog?

no

Copy link
Copy Markdown
Contributor Author

k11kirky commented May 25, 2026

@Basit-Balogun10
Copy link
Copy Markdown

Basit-Balogun10 commented May 25, 2026

Hello @k11kirky 👋🏾
This PR and #2353 should close #2062, right?

Adds a `breakdown` field to the `_posthog/usage_update` ext-notification
so the renderer can render per-source token splits in the upcoming
context-breakdown popover. The agent estimates token counts for the
stable pieces of the context (system prompt today; tools / MCP /
rules / skills / subagents will follow as the agent gets at-rest
access to their definitions) via a character-ratio heuristic. The
`conversation` bucket is derived as `max(0, currentInputTokens -
sum(stable))`, so the categories always sum to the input total.

The renderer's `useContextUsage` now scans backwards for both the
existing `session/update` aggregate and the new ext-notification's
breakdown, surfacing the latter as a nullable field. Existing callers
keep the aggregate; B4 wires the breakdown into the popover.

Generated-By: PostHog Code
Task-Id: bac06178-1ab1-4000-9a56-1901215bd4af

Generated-By: PostHog Code
Task-Id: bac06178-1ab1-4000-9a56-1901215bd4af
@k11kirky k11kirky force-pushed the posthog-code/usage-threshold-monitor branch from 1589fab to 9ee50f4 Compare May 26, 2026 09:11
@k11kirky k11kirky force-pushed the posthog-code/context-breakdown-data branch from 00f8802 to 80a75e8 Compare May 26, 2026 09:11
@k11kirky k11kirky marked this pull request as ready for review May 26, 2026 09:20
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 26, 2026

Prompt To Fix All With AI
Fix the following 3 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 3
packages/agent/src/adapters/claude/types.ts:67-71
**Baseline fields for `tools`, `rules`, `skills`, `mcp`, and `subagents` are never populated**

The JSDoc says "Refreshed at session init and on MCP/skill changes," but the initialization in `claude-agent.ts` spreads `emptyBaseline()` (all zeros) and only sets `systemPrompt`. The five other baseline fields stay at 0 for the entire session lifetime. Every `buildBreakdown` call will therefore attribute all non-system-prompt tokens to `conversation`, making the chart misleading for any session that has tools, rules, or MCP servers configured. Is the per-source population of those fields intentionally deferred to a follow-up PR, or is this an oversight?

### Issue 2 of 3
packages/agent/src/adapters/claude/context-breakdown.test.ts:17-20
The two assertions here are separate data points for the same behaviour and should use `it.each` per the team's preference for parameterised tests. The same applies to the "returns 0 for empty input" block above.

```suggestion
  it.each([
    [35, 10],
    [350, 100],
  ])("scales roughly with input length (%i chars → %i tokens)", (chars, expected) => {
    expect(estimateTokens("a".repeat(chars))).toBe(expected);
  });
```

### Issue 3 of 3
packages/agent/src/adapters/claude/context-breakdown.test.ts:11-15
Three independent null/empty inputs tested in one block — a good candidate for `it.each` to give each input its own test name.

```suggestion
  it.each([["", 0], [undefined, 0], [null, 0]] as const)(
    "returns 0 for %s",
    (input, expected) => {
      expect(estimateTokens(input)).toBe(expected);
    },
  );
```

Reviews (1): Last reviewed commit: "feat(agent): emit per-category context t..." | Re-trigger Greptile

Comment on lines 67 to +71
nextPendingOrder: number;
emitRawSDKMessages: boolean | SDKMessageFilter[];
/** Per-source token estimates for stable pieces (system prompt, tools, etc.)
* used by the renderer's context-breakdown popover. Refreshed at session
* init and on MCP/skill changes. */
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Baseline fields for tools, rules, skills, mcp, and subagents are never populated

The JSDoc says "Refreshed at session init and on MCP/skill changes," but the initialization in claude-agent.ts spreads emptyBaseline() (all zeros) and only sets systemPrompt. The five other baseline fields stay at 0 for the entire session lifetime. Every buildBreakdown call will therefore attribute all non-system-prompt tokens to conversation, making the chart misleading for any session that has tools, rules, or MCP servers configured. Is the per-source population of those fields intentionally deferred to a follow-up PR, or is this an oversight?

Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/agent/src/adapters/claude/types.ts
Line: 67-71

Comment:
**Baseline fields for `tools`, `rules`, `skills`, `mcp`, and `subagents` are never populated**

The JSDoc says "Refreshed at session init and on MCP/skill changes," but the initialization in `claude-agent.ts` spreads `emptyBaseline()` (all zeros) and only sets `systemPrompt`. The five other baseline fields stay at 0 for the entire session lifetime. Every `buildBreakdown` call will therefore attribute all non-system-prompt tokens to `conversation`, making the chart misleading for any session that has tools, rules, or MCP servers configured. Is the per-source population of those fields intentionally deferred to a follow-up PR, or is this an oversight?

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +17 to +20
it("scales roughly with input length", () => {
expect(estimateTokens("a".repeat(35))).toBe(10);
expect(estimateTokens("a".repeat(350))).toBe(100);
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The two assertions here are separate data points for the same behaviour and should use it.each per the team's preference for parameterised tests. The same applies to the "returns 0 for empty input" block above.

Suggested change
it("scales roughly with input length", () => {
expect(estimateTokens("a".repeat(35))).toBe(10);
expect(estimateTokens("a".repeat(350))).toBe(100);
});
it.each([
[35, 10],
[350, 100],
])("scales roughly with input length (%i chars → %i tokens)", (chars, expected) => {
expect(estimateTokens("a".repeat(chars))).toBe(expected);
});

Context Used: Do not attempt to comment on incorrect alphabetica... (source)

Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/agent/src/adapters/claude/context-breakdown.test.ts
Line: 17-20

Comment:
The two assertions here are separate data points for the same behaviour and should use `it.each` per the team's preference for parameterised tests. The same applies to the "returns 0 for empty input" block above.

```suggestion
  it.each([
    [35, 10],
    [350, 100],
  ])("scales roughly with input length (%i chars → %i tokens)", (chars, expected) => {
    expect(estimateTokens("a".repeat(chars))).toBe(expected);
  });
```

**Context Used:** Do not attempt to comment on incorrect alphabetica... ([source](https://app.greptile.com/review/custom-context?memory=instruction-0))

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines +11 to +15
it("returns 0 for empty input", () => {
expect(estimateTokens("")).toBe(0);
expect(estimateTokens(undefined)).toBe(0);
expect(estimateTokens(null)).toBe(0);
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Three independent null/empty inputs tested in one block — a good candidate for it.each to give each input its own test name.

Suggested change
it("returns 0 for empty input", () => {
expect(estimateTokens("")).toBe(0);
expect(estimateTokens(undefined)).toBe(0);
expect(estimateTokens(null)).toBe(0);
});
it.each([["", 0], [undefined, 0], [null, 0]] as const)(
"returns 0 for %s",
(input, expected) => {
expect(estimateTokens(input)).toBe(expected);
},
);

Context Used: Do not attempt to comment on incorrect alphabetica... (source)

Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/agent/src/adapters/claude/context-breakdown.test.ts
Line: 11-15

Comment:
Three independent null/empty inputs tested in one block — a good candidate for `it.each` to give each input its own test name.

```suggestion
  it.each([["", 0], [undefined, 0], [null, 0]] as const)(
    "returns 0 for %s",
    (input, expected) => {
      expect(estimateTokens(input)).toBe(expected);
    },
  );
```

**Context Used:** Do not attempt to comment on incorrect alphabetica... ([source](https://app.greptile.com/review/custom-context?memory=instruction-0))

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants