Skip to content

feat(billing): threshold notification service with persisted dedupe#2351

Open
k11kirky wants to merge 1 commit into
posthog-code/usage-sidebar-reset-timefrom
posthog-code/usage-threshold-monitor
Open

feat(billing): threshold notification service with persisted dedupe#2351
k11kirky wants to merge 1 commit into
posthog-code/usage-sidebar-reset-timefrom
posthog-code/usage-threshold-monitor

Conversation

@k11kirky
Copy link
Copy Markdown
Contributor

@k11kirky k11kirky commented May 25, 2026

Problem

Users had no visibility into their LLM usage approaching or reaching limits. Without proactive notifications, users would be surprised when hitting rate limits mid-session.

Changes

Introduced a UsageMonitorService that polls the LLM gateway every 30 seconds and emits threshold-crossed events when usage crosses 50%, 75%, 90%, or 100% for either the burst (daily) or sustained (monthly) bucket.

Key behaviors:

  • Only the highest threshold crossed in a given window is emitted per bucket — e.g. jumping straight to 95% fires a 90% event, not 50% and 75% as well
  • Deduplication is persisted to disk via electron-store so notifications don't re-fire after an app relaunch within the same billing window
  • Stale dedupe entries (past their anchor timestamp) are pruned on startup
  • Pro users are detected by the presence of billing_period_end on the usage response
  • Gateway errors are swallowed silently so polling never crashes the app

On the renderer side, initializeUsageThresholdToast subscribes to the new usageMonitor.onThresholdCrossed tRPC subscription and shows:

  • A warning toast with a "View usage" action for 50/75/90% thresholds
  • An error toast or the existing UsageLimitModal (when a session is active) at 100%

The old useUsageLimitDetection hook and its polling-based approach have been removed in favour of this event-driven model. The toast.warning utility was extended to support an action button.

How did you test this?

Unit tests cover the core deduplication and emission logic in UsageMonitorService:

  • Emits at the correct threshold and suppresses duplicate events within the same anchor window
  • Only fires the highest crossed threshold, not every threshold below it
  • Persisted dedupe state survives a simulated relaunch
  • Burst and sustained buckets are tracked independently
  • isPro is correctly derived from billing_period_end
  • Gateway errors resolve to null without throwing

Publish to changelog?

no

Copy link
Copy Markdown
Contributor Author

k11kirky commented May 25, 2026

Moves usage-limit detection out of the renderer into a main-process
`UsageMonitorService` that polls /v1/usage every 30s, detects when a
bucket newly crosses 50/75/90/100%, and emits an event through a
tRPC subscription. Dedupe state lives in a persistent electron-store
keyed by `${userId}:${product}:${bucket}:${anchor}:${threshold}` so
crossings don't re-fire after an app relaunch. Anchors are
`reset_at` rounded to the hour for burst (jitter-tolerant), and
`billing_period_end` (Pro) or the date of `reset_at` (Free) for
sustained. Stale entries are pruned on startup.

The renderer subscribes via `initializeUsageThresholdToast` (modelled
on `connectivityToast`) and shows a warning toast at 50/75/90% with
a "View usage" action that opens the Plan & Usage settings. At 100%
the existing `UsageLimitModal` opens if a session is active, else
the user gets a blocking error toast.

`useUsageLimitDetection` is deleted — the 100% path is now driven
from the same subscription. The renderer holds no detection state.
`toast.warning` is extended to forward an action button (the wiring
already exists in `ToastComponent`).

Generated-By: PostHog Code
Task-Id: bac06178-1ab1-4000-9a56-1901215bd4af

Generated-By: PostHog Code
Task-Id: bac06178-1ab1-4000-9a56-1901215bd4af
@k11kirky k11kirky force-pushed the posthog-code/usage-threshold-monitor branch from 1589fab to 9ee50f4 Compare May 26, 2026 09:11
@k11kirky k11kirky marked this pull request as ready for review May 26, 2026 09:20
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 26, 2026

Prompt To Fix All With AI
Fix the following 3 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 3
apps/code/src/main/services/usage-monitor/service.ts:75-81
**Stop/poll race: one extra poll can fire after `stop()` returns**

When the timeout fires, `this.pollTimeoutId` is set to `null` on line 77 before the async `pollOnce()` awaits. If `stop()` is called while that await is in-flight it sees `pollTimeoutId === null` and does nothing. When `pollOnce()` eventually resolves, line 79 calls `schedulePoll` again — setting a brand-new timeout that can never be cleared. The service will run one more full poll cycle after teardown.

### Issue 2 of 3
apps/code/src/main/services/usage-monitor/service.ts:37-41
**No immediate poll on startup — first notification delayed 30 s**

`init()` schedules the first poll with a full `POLL_INTERVAL_MS` delay. A user who launches the app while already at 90% will not receive any notification for 30 seconds. Adding a zero-delay initial poll (or calling `pollOnce()` directly in `init()`) before the recurring `schedulePoll` would close this gap.

### Issue 3 of 3
apps/code/src/main/services/usage-monitor/service.test.ts:89-182
**Missing escalation test; non-parameterised threshold cases**

No test covers the scenario where usage crosses a lower threshold (e.g. 55% → 50% fires) and then rises to a higher one (85% → 75% fires) within the same anchor window. This is the most important in-window state-machine transition and it's currently untested.

Separately, the threshold detection cases ("emits at 75%" and "only emits the highest threshold") share the same structure and are good candidates for a parameterised test per the team's convention — e.g. `it.each([[78, 75], [95, 90], [100, 100], [49, null]])("threshold at %i% fires %i", ...)`.

Reviews (1): Last reviewed commit: "feat(billing): threshold notification se..." | Re-trigger Greptile

Comment on lines +75 to +81
private schedulePoll(delayMs: number): void {
this.pollTimeoutId = setTimeout(async () => {
this.pollTimeoutId = null;
await this.pollOnce();
this.schedulePoll(POLL_INTERVAL_MS);
}, delayMs);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Stop/poll race: one extra poll can fire after stop() returns

When the timeout fires, this.pollTimeoutId is set to null on line 77 before the async pollOnce() awaits. If stop() is called while that await is in-flight it sees pollTimeoutId === null and does nothing. When pollOnce() eventually resolves, line 79 calls schedulePoll again — setting a brand-new timeout that can never be cleared. The service will run one more full poll cycle after teardown.

Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/code/src/main/services/usage-monitor/service.ts
Line: 75-81

Comment:
**Stop/poll race: one extra poll can fire after `stop()` returns**

When the timeout fires, `this.pollTimeoutId` is set to `null` on line 77 before the async `pollOnce()` awaits. If `stop()` is called while that await is in-flight it sees `pollTimeoutId === null` and does nothing. When `pollOnce()` eventually resolves, line 79 calls `schedulePoll` again — setting a brand-new timeout that can never be cleared. The service will run one more full poll cycle after teardown.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +37 to +41
@postConstruct()
init(): void {
this.pruneStaleEntries();
this.schedulePoll(POLL_INTERVAL_MS);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 No immediate poll on startup — first notification delayed 30 s

init() schedules the first poll with a full POLL_INTERVAL_MS delay. A user who launches the app while already at 90% will not receive any notification for 30 seconds. Adding a zero-delay initial poll (or calling pollOnce() directly in init()) before the recurring schedulePoll would close this gap.

Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/code/src/main/services/usage-monitor/service.ts
Line: 37-41

Comment:
**No immediate poll on startup — first notification delayed 30 s**

`init()` schedules the first poll with a full `POLL_INTERVAL_MS` delay. A user who launches the app while already at 90% will not receive any notification for 30 seconds. Adding a zero-delay initial poll (or calling `pollOnce()` directly in `init()`) before the recurring `schedulePoll` would close this gap.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +89 to +182
it("emits at 75% but not again on the next poll for the same anchor", async () => {
const events: unknown[] = [];
const gateway = mockGateway(makeUsage({ burstPercent: 78 }));
service = new UsageMonitorService(gateway);
service.on(UsageMonitorEvent.ThresholdCrossed, (e) => events.push(e));

await service.pollOnce();
expect(events).toHaveLength(1);
expect(events[0]).toMatchObject({
bucket: "burst",
threshold: 75,
usedPercent: 78,
});

await service.pollOnce();
expect(events).toHaveLength(1);
});

it("only emits the highest threshold a bucket has crossed", async () => {
const events: unknown[] = [];
const gateway = mockGateway(makeUsage({ burstPercent: 95 }));
service = new UsageMonitorService(gateway);
service.on(UsageMonitorEvent.ThresholdCrossed, (e) => events.push(e));

await service.pollOnce();
expect(events).toHaveLength(1);
expect(events[0]).toMatchObject({ threshold: 90 });
});

it("doesn't re-emit after a relaunch with persisted dedupe", async () => {
const events: unknown[] = [];
const gateway = mockGateway(makeUsage({ burstPercent: 55 }));
service = new UsageMonitorService(gateway);
service.on(UsageMonitorEvent.ThresholdCrossed, (e) => events.push(e));
await service.pollOnce();
expect(events).toHaveLength(1);
service.stop();

// Simulate relaunch
service = new UsageMonitorService(gateway);
service.on(UsageMonitorEvent.ThresholdCrossed, (e) => events.push(e));
await service.pollOnce();
expect(events).toHaveLength(1);
});

it("tracks burst and sustained as independent buckets", async () => {
const events: unknown[] = [];
const gateway = mockGateway(
makeUsage({
burstPercent: 55,
sustainedPercent: 80,
billingPeriodEnd: "2026-06-01T00:00:00.000Z",
}),
);
service = new UsageMonitorService(gateway);
service.on(UsageMonitorEvent.ThresholdCrossed, (e) => events.push(e));

await service.pollOnce();
expect(events).toHaveLength(2);
expect(events.map((e) => (e as { bucket: string }).bucket).sort()).toEqual([
"burst",
"sustained",
]);
});

it("marks events with isPro when billing_period_end is set", async () => {
const events: { isPro: boolean }[] = [];
const gateway = mockGateway(
makeUsage({
sustainedPercent: 60,
billingPeriodEnd: "2026-06-01T00:00:00.000Z",
}),
);
service = new UsageMonitorService(gateway);
service.on(UsageMonitorEvent.ThresholdCrossed, (e) =>
events.push(e as { isPro: boolean }),
);

await service.pollOnce();
expect(events[0]?.isPro).toBe(true);
});

it("silently skips polls when the gateway throws", async () => {
const events: unknown[] = [];
const gateway = {
fetchUsage: vi.fn().mockRejectedValue(new Error("not authenticated")),
} as unknown as LlmGatewayService;
service = new UsageMonitorService(gateway);
service.on(UsageMonitorEvent.ThresholdCrossed, (e) => events.push(e));

await expect(service.pollOnce()).resolves.toBeNull();
expect(events).toHaveLength(0);
});
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Missing escalation test; non-parameterised threshold cases

No test covers the scenario where usage crosses a lower threshold (e.g. 55% → 50% fires) and then rises to a higher one (85% → 75% fires) within the same anchor window. This is the most important in-window state-machine transition and it's currently untested.

Separately, the threshold detection cases ("emits at 75%" and "only emits the highest threshold") share the same structure and are good candidates for a parameterised test per the team's convention — e.g. it.each([[78, 75], [95, 90], [100, 100], [49, null]])("threshold at %i% fires %i", ...).

Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/code/src/main/services/usage-monitor/service.test.ts
Line: 89-182

Comment:
**Missing escalation test; non-parameterised threshold cases**

No test covers the scenario where usage crosses a lower threshold (e.g. 55% → 50% fires) and then rises to a higher one (85% → 75% fires) within the same anchor window. This is the most important in-window state-machine transition and it's currently untested.

Separately, the threshold detection cases ("emits at 75%" and "only emits the highest threshold") share the same structure and are good candidates for a parameterised test per the team's convention — e.g. `it.each([[78, 75], [95, 90], [100, 100], [49, null]])("threshold at %i% fires %i", ...)`.

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant