feat(billing): threshold notification service with persisted dedupe by k11kirky · Pull Request #2351 · PostHog/code

k11kirky · 2026-05-25T14:31:59Z

Problem

Users had no visibility into their LLM usage approaching or reaching limits. Without proactive notifications, users would be surprised when hitting rate limits mid-session.

Changes

Introduced a UsageMonitorService that polls the LLM gateway every 30 seconds and emits threshold-crossed events when usage crosses 50%, 75%, 90%, or 100% for either the burst (daily) or sustained (monthly) bucket.

Key behaviors:

Only the highest threshold crossed in a given window is emitted per bucket — e.g. jumping straight to 95% fires a 90% event, not 50% and 75% as well
Deduplication is persisted to disk via electron-store so notifications don't re-fire after an app relaunch within the same billing window
Stale dedupe entries (past their anchor timestamp) are pruned on startup
Pro users are detected by the presence of billing_period_end on the usage response
Gateway errors are swallowed silently so polling never crashes the app

On the renderer side, initializeUsageThresholdToast subscribes to the new usageMonitor.onThresholdCrossed tRPC subscription and shows:

A warning toast with a "View usage" action for 50/75/90% thresholds
An error toast or the existing UsageLimitModal (when a session is active) at 100%

The old useUsageLimitDetection hook and its polling-based approach have been removed in favour of this event-driven model. The toast.warning utility was extended to support an action button.

How did you test this?

Unit tests cover the core deduplication and emission logic in UsageMonitorService:

Emits at the correct threshold and suppresses duplicate events within the same anchor window
Only fires the highest crossed threshold, not every threshold below it
Persisted dedupe state survives a simulated relaunch
Burst and sustained buckets are tracked independently
isPro is correctly derived from billing_period_end
Gateway errors resolve to null without throwing

Publish to changelog?

no

k11kirky · 2026-05-25T14:32:13Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

Moves usage-limit detection out of the renderer into a main-process `UsageMonitorService` that polls /v1/usage every 30s, detects when a bucket newly crosses 50/75/90/100%, and emits an event through a tRPC subscription. Dedupe state lives in a persistent electron-store keyed by `${userId}:${product}:${bucket}:${anchor}:${threshold}` so crossings don't re-fire after an app relaunch. Anchors are `reset_at` rounded to the hour for burst (jitter-tolerant), and `billing_period_end` (Pro) or the date of `reset_at` (Free) for sustained. Stale entries are pruned on startup. The renderer subscribes via `initializeUsageThresholdToast` (modelled on `connectivityToast`) and shows a warning toast at 50/75/90% with a "View usage" action that opens the Plan & Usage settings. At 100% the existing `UsageLimitModal` opens if a session is active, else the user gets a blocking error toast. `useUsageLimitDetection` is deleted — the 100% path is now driven from the same subscription. The renderer holds no detection state. `toast.warning` is extended to forward an action button (the wiring already exists in `ToastComponent`). Generated-By: PostHog Code Task-Id: bac06178-1ab1-4000-9a56-1901215bd4af Generated-By: PostHog Code Task-Id: bac06178-1ab1-4000-9a56-1901215bd4af

greptile-apps · 2026-05-26T09:25:17Z

Prompt To Fix All With AI

Fix the following 3 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 3
apps/code/src/main/services/usage-monitor/service.ts:75-81
**Stop/poll race: one extra poll can fire after `stop()` returns**

When the timeout fires, `this.pollTimeoutId` is set to `null` on line 77 before the async `pollOnce()` awaits. If `stop()` is called while that await is in-flight it sees `pollTimeoutId === null` and does nothing. When `pollOnce()` eventually resolves, line 79 calls `schedulePoll` again — setting a brand-new timeout that can never be cleared. The service will run one more full poll cycle after teardown.

### Issue 2 of 3
apps/code/src/main/services/usage-monitor/service.ts:37-41
**No immediate poll on startup — first notification delayed 30 s**

`init()` schedules the first poll with a full `POLL_INTERVAL_MS` delay. A user who launches the app while already at 90% will not receive any notification for 30 seconds. Adding a zero-delay initial poll (or calling `pollOnce()` directly in `init()`) before the recurring `schedulePoll` would close this gap.

### Issue 3 of 3
apps/code/src/main/services/usage-monitor/service.test.ts:89-182
**Missing escalation test; non-parameterised threshold cases**

No test covers the scenario where usage crosses a lower threshold (e.g. 55% → 50% fires) and then rises to a higher one (85% → 75% fires) within the same anchor window. This is the most important in-window state-machine transition and it's currently untested.

Separately, the threshold detection cases ("emits at 75%" and "only emits the highest threshold") share the same structure and are good candidates for a parameterised test per the team's convention — e.g. `it.each([[78, 75], [95, 90], [100, 100], [49, null]])("threshold at %i% fires %i", ...)`.

_{Reviews (1): Last reviewed commit: "feat(billing): threshold notification se..." | Re-trigger Greptile}

greptile-apps · 2026-05-26T09:25:23Z

+  private schedulePoll(delayMs: number): void {
+    this.pollTimeoutId = setTimeout(async () => {
+      this.pollTimeoutId = null;
+      await this.pollOnce();
+      this.schedulePoll(POLL_INTERVAL_MS);
+    }, delayMs);
+  }


Stop/poll race: one extra poll can fire after stop() returns

When the timeout fires, this.pollTimeoutId is set to null on line 77 before the async pollOnce() awaits. If stop() is called while that await is in-flight it sees pollTimeoutId === null and does nothing. When pollOnce() eventually resolves, line 79 calls schedulePoll again — setting a brand-new timeout that can never be cleared. The service will run one more full poll cycle after teardown.

Prompt To Fix With AI

This is a comment left during a code review. Path: apps/code/src/main/services/usage-monitor/service.ts Line: 75-81 Comment: **Stop/poll race: one extra poll can fire after `stop()` returns** When the timeout fires, `this.pollTimeoutId` is set to `null` on line 77 before the async `pollOnce()` awaits. If `stop()` is called while that await is in-flight it sees `pollTimeoutId === null` and does nothing. When `pollOnce()` eventually resolves, line 79 calls `schedulePoll` again — setting a brand-new timeout that can never be cleared. The service will run one more full poll cycle after teardown. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-05-26T09:25:25Z

+  @postConstruct()
+  init(): void {
+    this.pruneStaleEntries();
+    this.schedulePoll(POLL_INTERVAL_MS);
+  }


No immediate poll on startup — first notification delayed 30 s

init() schedules the first poll with a full POLL_INTERVAL_MS delay. A user who launches the app while already at 90% will not receive any notification for 30 seconds. Adding a zero-delay initial poll (or calling pollOnce() directly in init()) before the recurring schedulePoll would close this gap.

Prompt To Fix With AI

This is a comment left during a code review. Path: apps/code/src/main/services/usage-monitor/service.ts Line: 37-41 Comment: **No immediate poll on startup — first notification delayed 30 s** `init()` schedules the first poll with a full `POLL_INTERVAL_MS` delay. A user who launches the app while already at 90% will not receive any notification for 30 seconds. Adding a zero-delay initial poll (or calling `pollOnce()` directly in `init()`) before the recurring `schedulePoll` would close this gap. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-05-26T09:25:28Z

+  it("emits at 75% but not again on the next poll for the same anchor", async () => {
+    const events: unknown[] = [];
+    const gateway = mockGateway(makeUsage({ burstPercent: 78 }));
+    service = new UsageMonitorService(gateway);
+    service.on(UsageMonitorEvent.ThresholdCrossed, (e) => events.push(e));
+
+    await service.pollOnce();
+    expect(events).toHaveLength(1);
+    expect(events[0]).toMatchObject({
+      bucket: "burst",
+      threshold: 75,
+      usedPercent: 78,
+    });
+
+    await service.pollOnce();
+    expect(events).toHaveLength(1);
+  });
+
+  it("only emits the highest threshold a bucket has crossed", async () => {
+    const events: unknown[] = [];
+    const gateway = mockGateway(makeUsage({ burstPercent: 95 }));
+    service = new UsageMonitorService(gateway);
+    service.on(UsageMonitorEvent.ThresholdCrossed, (e) => events.push(e));
+
+    await service.pollOnce();
+    expect(events).toHaveLength(1);
+    expect(events[0]).toMatchObject({ threshold: 90 });
+  });
+
+  it("doesn't re-emit after a relaunch with persisted dedupe", async () => {
+    const events: unknown[] = [];
+    const gateway = mockGateway(makeUsage({ burstPercent: 55 }));
+    service = new UsageMonitorService(gateway);
+    service.on(UsageMonitorEvent.ThresholdCrossed, (e) => events.push(e));
+    await service.pollOnce();
+    expect(events).toHaveLength(1);
+    service.stop();
+
+    // Simulate relaunch
+    service = new UsageMonitorService(gateway);
+    service.on(UsageMonitorEvent.ThresholdCrossed, (e) => events.push(e));
+    await service.pollOnce();
+    expect(events).toHaveLength(1);
+  });
+
+  it("tracks burst and sustained as independent buckets", async () => {
+    const events: unknown[] = [];
+    const gateway = mockGateway(
+      makeUsage({
+        burstPercent: 55,
+        sustainedPercent: 80,
+        billingPeriodEnd: "2026-06-01T00:00:00.000Z",
+      }),
+    );
+    service = new UsageMonitorService(gateway);
+    service.on(UsageMonitorEvent.ThresholdCrossed, (e) => events.push(e));
+
+    await service.pollOnce();
+    expect(events).toHaveLength(2);
+    expect(events.map((e) => (e as { bucket: string }).bucket).sort()).toEqual([
+      "burst",
+      "sustained",
+    ]);
+  });
+
+  it("marks events with isPro when billing_period_end is set", async () => {
+    const events: { isPro: boolean }[] = [];
+    const gateway = mockGateway(
+      makeUsage({
+        sustainedPercent: 60,
+        billingPeriodEnd: "2026-06-01T00:00:00.000Z",
+      }),
+    );
+    service = new UsageMonitorService(gateway);
+    service.on(UsageMonitorEvent.ThresholdCrossed, (e) =>
+      events.push(e as { isPro: boolean }),
+    );
+
+    await service.pollOnce();
+    expect(events[0]?.isPro).toBe(true);
+  });
+
+  it("silently skips polls when the gateway throws", async () => {
+    const events: unknown[] = [];
+    const gateway = {
+      fetchUsage: vi.fn().mockRejectedValue(new Error("not authenticated")),
+    } as unknown as LlmGatewayService;
+    service = new UsageMonitorService(gateway);
+    service.on(UsageMonitorEvent.ThresholdCrossed, (e) => events.push(e));
+
+    await expect(service.pollOnce()).resolves.toBeNull();
+    expect(events).toHaveLength(0);
+  });
+});


Missing escalation test; non-parameterised threshold cases

No test covers the scenario where usage crosses a lower threshold (e.g. 55% → 50% fires) and then rises to a higher one (85% → 75% fires) within the same anchor window. This is the most important in-window state-machine transition and it's currently untested.

Separately, the threshold detection cases ("emits at 75%" and "only emits the highest threshold") share the same structure and are good candidates for a parameterised test per the team's convention — e.g. it.each([[78, 75], [95, 90], [100, 100], [49, null]])("threshold at %i% fires %i", ...).

Prompt To Fix With AI

This is a comment left during a code review. Path: apps/code/src/main/services/usage-monitor/service.test.ts Line: 89-182 Comment: **Missing escalation test; non-parameterised threshold cases** No test covers the scenario where usage crosses a lower threshold (e.g. 55% → 50% fires) and then rises to a higher one (85% → 75% fires) within the same anchor window. This is the most important in-window state-machine transition and it's currently untested. Separately, the threshold detection cases ("emits at 75%" and "only emits the highest threshold") share the same structure and are good candidates for a parameterised test per the team's convention — e.g. `it.each([[78, 75], [95, 90], [100, 100], [49, null]])("threshold at %i% fires %i", ...)`. How can I resolve this? If you propose a fix, please make it concise.

k11kirky mentioned this pull request May 25, 2026

feat(sessions): context breakdown popover #2353

Open

This was referenced May 25, 2026

feat(agent): emit per-category context token breakdown #2352

Open

feat(billing): always-on Free sidebar bar with reset time #2350

Open

k11kirky force-pushed the posthog-code/usage-threshold-monitor branch from 23ff881 to d6105bb Compare May 25, 2026 16:30

k11kirky force-pushed the posthog-code/usage-sidebar-reset-time branch from c79d047 to a6fca1b Compare May 25, 2026 16:30

This was referenced May 25, 2026

feat(agent): fill Skills/MCP/Rules categories in context breakdown #2357

Open

feat(billing): single-source usage via main-process relay #2358

Open

k11kirky force-pushed the posthog-code/usage-sidebar-reset-time branch from a6fca1b to 211250e Compare May 25, 2026 16:58

k11kirky force-pushed the posthog-code/usage-threshold-monitor branch from d6105bb to 1589fab Compare May 25, 2026 16:58

k11kirky force-pushed the posthog-code/usage-threshold-monitor branch from 1589fab to 9ee50f4 Compare May 26, 2026 09:11

k11kirky marked this pull request as ready for review May 26, 2026 09:20

greptile-apps Bot reviewed May 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(billing): threshold notification service with persisted dedupe#2351

feat(billing): threshold notification service with persisted dedupe#2351
k11kirky wants to merge 1 commit into
posthog-code/usage-sidebar-reset-timefrom
posthog-code/usage-threshold-monitor

k11kirky commented May 25, 2026 •

edited

Loading

Uh oh!

k11kirky commented May 25, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented May 26, 2026

Uh oh!

greptile-apps Bot May 26, 2026

Uh oh!

greptile-apps Bot May 26, 2026

Uh oh!

greptile-apps Bot May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

k11kirky commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Changes

How did you test this?

Publish to changelog?

Uh oh!

k11kirky commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps Bot commented May 26, 2026

Uh oh!

greptile-apps Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

k11kirky commented May 25, 2026 •

edited

Loading

k11kirky commented May 25, 2026 •

edited

Loading