feat(agent): emit per-category context token breakdown by k11kirky · Pull Request #2352 · PostHog/code

k11kirky · 2026-05-25T14:32:03Z

Problem

The context usage indicator in the renderer showed only aggregate token counts (used / size / percentage) with no breakdown of where those tokens were coming from. Users had no way to see how much of the context window was consumed by the system prompt, tools, rules, conversation history, etc.

Changes

Agent-side (packages/agent):

Added context-breakdown.ts with a lightweight character-ratio token estimator (~3.5 chars/token) and helpers:
- estimateTokens / estimateJsonTokens for cheap client-side estimation
- estimateSystemPrompt handles raw strings, { type: "preset", append } objects, and undefined, adding a constant CLAUDE_PRESET_ESTIMATE_TOKENS (4000) when the opaque Claude preset is in use
- buildBreakdown derives the conversation bucket as whatever input tokens remain after subtracting the stable pieces (system prompt, tools, rules, skills, MCP, subagents), floored at 0 to absorb estimation drift
- emptyBaseline / ContextBreakdownBaseline types for initializing and carrying per-source estimates across turns
Attached a contextBreakdownBaseline to the Session type, initialized at session start with the estimated system-prompt token count
Emits the breakdown alongside the existing _posthog/usage_update notification after each model turn, using the result's own input token categories rather than the streamed delta to handle subagent turns correctly

Renderer-side (apps/code):

Extended ContextUsage with a breakdown: ContextBreakdown | null field and added the ContextBreakdown interface (mirroring the agent shape, kept local to avoid a cross-package dependency)
Refactored extractContextUsage to scan events in a single reverse pass, independently finding the latest aggregate (session/update) and the latest breakdown (_posthog/usage_update or __posthog/usage_update), then merging them

How did you test this?

Added unit tests for context-breakdown.ts covering estimateTokens, estimateJsonTokens, estimateSystemPrompt, and buildBreakdown (edge cases: empty input, circular JSON, conversation floored at 0, preset vs. raw string)
Added unit tests for extractContextUsage covering: no events, aggregate-only, merged breakdown, and the double-underscore method prefix variant

Publish to changelog?

no

k11kirky · 2026-05-25T14:32:12Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

Basit-Balogun10 · 2026-05-25T17:23:31Z

Hello @k11kirky 👋🏾
This PR and #2353 should close #2062, right?

Adds a `breakdown` field to the `_posthog/usage_update` ext-notification so the renderer can render per-source token splits in the upcoming context-breakdown popover. The agent estimates token counts for the stable pieces of the context (system prompt today; tools / MCP / rules / skills / subagents will follow as the agent gets at-rest access to their definitions) via a character-ratio heuristic. The `conversation` bucket is derived as `max(0, currentInputTokens - sum(stable))`, so the categories always sum to the input total. The renderer's `useContextUsage` now scans backwards for both the existing `session/update` aggregate and the new ext-notification's breakdown, surfacing the latter as a nullable field. Existing callers keep the aggregate; B4 wires the breakdown into the popover. Generated-By: PostHog Code Task-Id: bac06178-1ab1-4000-9a56-1901215bd4af Generated-By: PostHog Code Task-Id: bac06178-1ab1-4000-9a56-1901215bd4af

greptile-apps · 2026-05-26T09:23:34Z

Prompt To Fix All With AI

Fix the following 3 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 3
packages/agent/src/adapters/claude/types.ts:67-71
**Baseline fields for `tools`, `rules`, `skills`, `mcp`, and `subagents` are never populated**

The JSDoc says "Refreshed at session init and on MCP/skill changes," but the initialization in `claude-agent.ts` spreads `emptyBaseline()` (all zeros) and only sets `systemPrompt`. The five other baseline fields stay at 0 for the entire session lifetime. Every `buildBreakdown` call will therefore attribute all non-system-prompt tokens to `conversation`, making the chart misleading for any session that has tools, rules, or MCP servers configured. Is the per-source population of those fields intentionally deferred to a follow-up PR, or is this an oversight?

### Issue 2 of 3
packages/agent/src/adapters/claude/context-breakdown.test.ts:17-20
The two assertions here are separate data points for the same behaviour and should use `it.each` per the team's preference for parameterised tests. The same applies to the "returns 0 for empty input" block above.

```suggestion
  it.each([
    [35, 10],
    [350, 100],
  ])("scales roughly with input length (%i chars → %i tokens)", (chars, expected) => {
    expect(estimateTokens("a".repeat(chars))).toBe(expected);
  });
```

### Issue 3 of 3
packages/agent/src/adapters/claude/context-breakdown.test.ts:11-15
Three independent null/empty inputs tested in one block — a good candidate for `it.each` to give each input its own test name.

```suggestion
  it.each([["", 0], [undefined, 0], [null, 0]] as const)(
    "returns 0 for %s",
    (input, expected) => {
      expect(estimateTokens(input)).toBe(expected);
    },
  );
```

_{Reviews (1): Last reviewed commit: "feat(agent): emit per-category context t..." | Re-trigger Greptile}

greptile-apps · 2026-05-26T09:23:38Z

  nextPendingOrder: number;
  emitRawSDKMessages: boolean | SDKMessageFilter[];
+  /** Per-source token estimates for stable pieces (system prompt, tools, etc.)
+   *  used by the renderer's context-breakdown popover. Refreshed at session
+   *  init and on MCP/skill changes. */


Baseline fields for tools, rules, skills, mcp, and subagents are never populated

The JSDoc says "Refreshed at session init and on MCP/skill changes," but the initialization in claude-agent.ts spreads emptyBaseline() (all zeros) and only sets systemPrompt. The five other baseline fields stay at 0 for the entire session lifetime. Every buildBreakdown call will therefore attribute all non-system-prompt tokens to conversation, making the chart misleading for any session that has tools, rules, or MCP servers configured. Is the per-source population of those fields intentionally deferred to a follow-up PR, or is this an oversight?

Prompt To Fix With AI

This is a comment left during a code review. Path: packages/agent/src/adapters/claude/types.ts Line: 67-71 Comment: **Baseline fields for `tools`, `rules`, `skills`, `mcp`, and `subagents` are never populated** The JSDoc says "Refreshed at session init and on MCP/skill changes," but the initialization in `claude-agent.ts` spreads `emptyBaseline()` (all zeros) and only sets `systemPrompt`. The five other baseline fields stay at 0 for the entire session lifetime. Every `buildBreakdown` call will therefore attribute all non-system-prompt tokens to `conversation`, making the chart misleading for any session that has tools, rules, or MCP servers configured. Is the per-source population of those fields intentionally deferred to a follow-up PR, or is this an oversight? How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-05-26T09:23:38Z

+  it("scales roughly with input length", () => {
+    expect(estimateTokens("a".repeat(35))).toBe(10);
+    expect(estimateTokens("a".repeat(350))).toBe(100);
+  });


The two assertions here are separate data points for the same behaviour and should use it.each per the team's preference for parameterised tests. The same applies to the "returns 0 for empty input" block above.

Suggested change

it("scales roughly with input length", () => {

expect(estimateTokens("a".repeat(35))).toBe(10);

expect(estimateTokens("a".repeat(350))).toBe(100);

});

it.each([

[35, 10],

[350, 100],

])("scales roughly with input length (%i chars → %i tokens)", (chars, expected) => {

expect(estimateTokens("a".repeat(chars))).toBe(expected);

});

Context Used: Do not attempt to comment on incorrect alphabetica... (source)

Prompt To Fix With AI

This is a comment left during a code review. Path: packages/agent/src/adapters/claude/context-breakdown.test.ts Line: 17-20 Comment: The two assertions here are separate data points for the same behaviour and should use `it.each` per the team's preference for parameterised tests. The same applies to the "returns 0 for empty input" block above. ```suggestion it.each([ [35, 10], [350, 100], ])("scales roughly with input length (%i chars → %i tokens)", (chars, expected) => { expect(estimateTokens("a".repeat(chars))).toBe(expected); }); ``` **Context Used:** Do not attempt to comment on incorrect alphabetica... ([source](https://app.greptile.com/review/custom-context?memory=instruction-0)) How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

greptile-apps · 2026-05-26T09:23:40Z

+  it("returns 0 for empty input", () => {
+    expect(estimateTokens("")).toBe(0);
+    expect(estimateTokens(undefined)).toBe(0);
+    expect(estimateTokens(null)).toBe(0);
+  });


Three independent null/empty inputs tested in one block — a good candidate for it.each to give each input its own test name.

Suggested change

it("returns 0 for empty input", () => {

expect(estimateTokens("")).toBe(0);

expect(estimateTokens(undefined)).toBe(0);

expect(estimateTokens(null)).toBe(0);

});

it.each([["", 0], [undefined, 0], [null, 0]] as const)(

"returns 0 for %s",

(input, expected) => {

expect(estimateTokens(input)).toBe(expected);

},

);

Context Used: Do not attempt to comment on incorrect alphabetica... (source)

Prompt To Fix With AI

This is a comment left during a code review. Path: packages/agent/src/adapters/claude/context-breakdown.test.ts Line: 11-15 Comment: Three independent null/empty inputs tested in one block — a good candidate for `it.each` to give each input its own test name. ```suggestion it.each([["", 0], [undefined, 0], [null, 0]] as const)( "returns 0 for %s", (input, expected) => { expect(estimateTokens(input)).toBe(expected); }, ); ``` **Context Used:** Do not attempt to comment on incorrect alphabetica... ([source](https://app.greptile.com/review/custom-context?memory=instruction-0)) How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

This was referenced May 25, 2026

feat(sessions): context breakdown popover #2353

Open

feat(billing): threshold notification service with persisted dedupe #2351

Open

feat(billing): always-on Free sidebar bar with reset time #2350

Open

k11kirky force-pushed the posthog-code/usage-threshold-monitor branch from 23ff881 to d6105bb Compare May 25, 2026 16:30

k11kirky force-pushed the posthog-code/context-breakdown-data branch from 7032aef to 78a3903 Compare May 25, 2026 16:30

This was referenced May 25, 2026

feat(agent): fill Skills/MCP/Rules categories in context breakdown #2357

Open

feat(billing): single-source usage via main-process relay #2358

Open

k11kirky force-pushed the posthog-code/usage-threshold-monitor branch from d6105bb to 1589fab Compare May 25, 2026 16:58

k11kirky force-pushed the posthog-code/context-breakdown-data branch from 78a3903 to 00f8802 Compare May 25, 2026 16:58

k11kirky force-pushed the posthog-code/usage-threshold-monitor branch from 1589fab to 9ee50f4 Compare May 26, 2026 09:11

k11kirky force-pushed the posthog-code/context-breakdown-data branch from 00f8802 to 80a75e8 Compare May 26, 2026 09:11

k11kirky marked this pull request as ready for review May 26, 2026 09:20

greptile-apps Bot reviewed May 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(agent): emit per-category context token breakdown#2352

feat(agent): emit per-category context token breakdown#2352
k11kirky wants to merge 1 commit into
posthog-code/usage-threshold-monitorfrom
posthog-code/context-breakdown-data

k11kirky commented May 25, 2026 •

edited

Loading

Uh oh!

k11kirky commented May 25, 2026 •

edited

Loading

Uh oh!

Basit-Balogun10 commented May 25, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented May 26, 2026

Uh oh!

greptile-apps Bot May 26, 2026

Uh oh!

greptile-apps Bot May 26, 2026

Uh oh!

greptile-apps Bot May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

k11kirky commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Changes

How did you test this?

Publish to changelog?

Uh oh!

k11kirky commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Basit-Balogun10 commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps Bot commented May 26, 2026

Uh oh!

greptile-apps Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

k11kirky commented May 25, 2026 •

edited

Loading

k11kirky commented May 25, 2026 •

edited

Loading

Basit-Balogun10 commented May 25, 2026 •

edited

Loading