feat(billing): threshold notification service with persisted dedupe#2351
feat(billing): threshold notification service with persisted dedupe#2351k11kirky wants to merge 1 commit into
Conversation
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
23ff881 to
d6105bb
Compare
c79d047 to
a6fca1b
Compare
a6fca1b to
211250e
Compare
d6105bb to
1589fab
Compare
Moves usage-limit detection out of the renderer into a main-process
`UsageMonitorService` that polls /v1/usage every 30s, detects when a
bucket newly crosses 50/75/90/100%, and emits an event through a
tRPC subscription. Dedupe state lives in a persistent electron-store
keyed by `${userId}:${product}:${bucket}:${anchor}:${threshold}` so
crossings don't re-fire after an app relaunch. Anchors are
`reset_at` rounded to the hour for burst (jitter-tolerant), and
`billing_period_end` (Pro) or the date of `reset_at` (Free) for
sustained. Stale entries are pruned on startup.
The renderer subscribes via `initializeUsageThresholdToast` (modelled
on `connectivityToast`) and shows a warning toast at 50/75/90% with
a "View usage" action that opens the Plan & Usage settings. At 100%
the existing `UsageLimitModal` opens if a session is active, else
the user gets a blocking error toast.
`useUsageLimitDetection` is deleted — the 100% path is now driven
from the same subscription. The renderer holds no detection state.
`toast.warning` is extended to forward an action button (the wiring
already exists in `ToastComponent`).
Generated-By: PostHog Code
Task-Id: bac06178-1ab1-4000-9a56-1901215bd4af
Generated-By: PostHog Code
Task-Id: bac06178-1ab1-4000-9a56-1901215bd4af
1589fab to
9ee50f4
Compare
Prompt To Fix All With AIFix the following 3 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 3
apps/code/src/main/services/usage-monitor/service.ts:75-81
**Stop/poll race: one extra poll can fire after `stop()` returns**
When the timeout fires, `this.pollTimeoutId` is set to `null` on line 77 before the async `pollOnce()` awaits. If `stop()` is called while that await is in-flight it sees `pollTimeoutId === null` and does nothing. When `pollOnce()` eventually resolves, line 79 calls `schedulePoll` again — setting a brand-new timeout that can never be cleared. The service will run one more full poll cycle after teardown.
### Issue 2 of 3
apps/code/src/main/services/usage-monitor/service.ts:37-41
**No immediate poll on startup — first notification delayed 30 s**
`init()` schedules the first poll with a full `POLL_INTERVAL_MS` delay. A user who launches the app while already at 90% will not receive any notification for 30 seconds. Adding a zero-delay initial poll (or calling `pollOnce()` directly in `init()`) before the recurring `schedulePoll` would close this gap.
### Issue 3 of 3
apps/code/src/main/services/usage-monitor/service.test.ts:89-182
**Missing escalation test; non-parameterised threshold cases**
No test covers the scenario where usage crosses a lower threshold (e.g. 55% → 50% fires) and then rises to a higher one (85% → 75% fires) within the same anchor window. This is the most important in-window state-machine transition and it's currently untested.
Separately, the threshold detection cases ("emits at 75%" and "only emits the highest threshold") share the same structure and are good candidates for a parameterised test per the team's convention — e.g. `it.each([[78, 75], [95, 90], [100, 100], [49, null]])("threshold at %i% fires %i", ...)`.
Reviews (1): Last reviewed commit: "feat(billing): threshold notification se..." | Re-trigger Greptile |
| private schedulePoll(delayMs: number): void { | ||
| this.pollTimeoutId = setTimeout(async () => { | ||
| this.pollTimeoutId = null; | ||
| await this.pollOnce(); | ||
| this.schedulePoll(POLL_INTERVAL_MS); | ||
| }, delayMs); | ||
| } |
There was a problem hiding this comment.
Stop/poll race: one extra poll can fire after
stop() returns
When the timeout fires, this.pollTimeoutId is set to null on line 77 before the async pollOnce() awaits. If stop() is called while that await is in-flight it sees pollTimeoutId === null and does nothing. When pollOnce() eventually resolves, line 79 calls schedulePoll again — setting a brand-new timeout that can never be cleared. The service will run one more full poll cycle after teardown.
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/code/src/main/services/usage-monitor/service.ts
Line: 75-81
Comment:
**Stop/poll race: one extra poll can fire after `stop()` returns**
When the timeout fires, `this.pollTimeoutId` is set to `null` on line 77 before the async `pollOnce()` awaits. If `stop()` is called while that await is in-flight it sees `pollTimeoutId === null` and does nothing. When `pollOnce()` eventually resolves, line 79 calls `schedulePoll` again — setting a brand-new timeout that can never be cleared. The service will run one more full poll cycle after teardown.
How can I resolve this? If you propose a fix, please make it concise.| @postConstruct() | ||
| init(): void { | ||
| this.pruneStaleEntries(); | ||
| this.schedulePoll(POLL_INTERVAL_MS); | ||
| } |
There was a problem hiding this comment.
No immediate poll on startup — first notification delayed 30 s
init() schedules the first poll with a full POLL_INTERVAL_MS delay. A user who launches the app while already at 90% will not receive any notification for 30 seconds. Adding a zero-delay initial poll (or calling pollOnce() directly in init()) before the recurring schedulePoll would close this gap.
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/code/src/main/services/usage-monitor/service.ts
Line: 37-41
Comment:
**No immediate poll on startup — first notification delayed 30 s**
`init()` schedules the first poll with a full `POLL_INTERVAL_MS` delay. A user who launches the app while already at 90% will not receive any notification for 30 seconds. Adding a zero-delay initial poll (or calling `pollOnce()` directly in `init()`) before the recurring `schedulePoll` would close this gap.
How can I resolve this? If you propose a fix, please make it concise.| it("emits at 75% but not again on the next poll for the same anchor", async () => { | ||
| const events: unknown[] = []; | ||
| const gateway = mockGateway(makeUsage({ burstPercent: 78 })); | ||
| service = new UsageMonitorService(gateway); | ||
| service.on(UsageMonitorEvent.ThresholdCrossed, (e) => events.push(e)); | ||
|
|
||
| await service.pollOnce(); | ||
| expect(events).toHaveLength(1); | ||
| expect(events[0]).toMatchObject({ | ||
| bucket: "burst", | ||
| threshold: 75, | ||
| usedPercent: 78, | ||
| }); | ||
|
|
||
| await service.pollOnce(); | ||
| expect(events).toHaveLength(1); | ||
| }); | ||
|
|
||
| it("only emits the highest threshold a bucket has crossed", async () => { | ||
| const events: unknown[] = []; | ||
| const gateway = mockGateway(makeUsage({ burstPercent: 95 })); | ||
| service = new UsageMonitorService(gateway); | ||
| service.on(UsageMonitorEvent.ThresholdCrossed, (e) => events.push(e)); | ||
|
|
||
| await service.pollOnce(); | ||
| expect(events).toHaveLength(1); | ||
| expect(events[0]).toMatchObject({ threshold: 90 }); | ||
| }); | ||
|
|
||
| it("doesn't re-emit after a relaunch with persisted dedupe", async () => { | ||
| const events: unknown[] = []; | ||
| const gateway = mockGateway(makeUsage({ burstPercent: 55 })); | ||
| service = new UsageMonitorService(gateway); | ||
| service.on(UsageMonitorEvent.ThresholdCrossed, (e) => events.push(e)); | ||
| await service.pollOnce(); | ||
| expect(events).toHaveLength(1); | ||
| service.stop(); | ||
|
|
||
| // Simulate relaunch | ||
| service = new UsageMonitorService(gateway); | ||
| service.on(UsageMonitorEvent.ThresholdCrossed, (e) => events.push(e)); | ||
| await service.pollOnce(); | ||
| expect(events).toHaveLength(1); | ||
| }); | ||
|
|
||
| it("tracks burst and sustained as independent buckets", async () => { | ||
| const events: unknown[] = []; | ||
| const gateway = mockGateway( | ||
| makeUsage({ | ||
| burstPercent: 55, | ||
| sustainedPercent: 80, | ||
| billingPeriodEnd: "2026-06-01T00:00:00.000Z", | ||
| }), | ||
| ); | ||
| service = new UsageMonitorService(gateway); | ||
| service.on(UsageMonitorEvent.ThresholdCrossed, (e) => events.push(e)); | ||
|
|
||
| await service.pollOnce(); | ||
| expect(events).toHaveLength(2); | ||
| expect(events.map((e) => (e as { bucket: string }).bucket).sort()).toEqual([ | ||
| "burst", | ||
| "sustained", | ||
| ]); | ||
| }); | ||
|
|
||
| it("marks events with isPro when billing_period_end is set", async () => { | ||
| const events: { isPro: boolean }[] = []; | ||
| const gateway = mockGateway( | ||
| makeUsage({ | ||
| sustainedPercent: 60, | ||
| billingPeriodEnd: "2026-06-01T00:00:00.000Z", | ||
| }), | ||
| ); | ||
| service = new UsageMonitorService(gateway); | ||
| service.on(UsageMonitorEvent.ThresholdCrossed, (e) => | ||
| events.push(e as { isPro: boolean }), | ||
| ); | ||
|
|
||
| await service.pollOnce(); | ||
| expect(events[0]?.isPro).toBe(true); | ||
| }); | ||
|
|
||
| it("silently skips polls when the gateway throws", async () => { | ||
| const events: unknown[] = []; | ||
| const gateway = { | ||
| fetchUsage: vi.fn().mockRejectedValue(new Error("not authenticated")), | ||
| } as unknown as LlmGatewayService; | ||
| service = new UsageMonitorService(gateway); | ||
| service.on(UsageMonitorEvent.ThresholdCrossed, (e) => events.push(e)); | ||
|
|
||
| await expect(service.pollOnce()).resolves.toBeNull(); | ||
| expect(events).toHaveLength(0); | ||
| }); | ||
| }); |
There was a problem hiding this comment.
Missing escalation test; non-parameterised threshold cases
No test covers the scenario where usage crosses a lower threshold (e.g. 55% → 50% fires) and then rises to a higher one (85% → 75% fires) within the same anchor window. This is the most important in-window state-machine transition and it's currently untested.
Separately, the threshold detection cases ("emits at 75%" and "only emits the highest threshold") share the same structure and are good candidates for a parameterised test per the team's convention — e.g. it.each([[78, 75], [95, 90], [100, 100], [49, null]])("threshold at %i% fires %i", ...).
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/code/src/main/services/usage-monitor/service.test.ts
Line: 89-182
Comment:
**Missing escalation test; non-parameterised threshold cases**
No test covers the scenario where usage crosses a lower threshold (e.g. 55% → 50% fires) and then rises to a higher one (85% → 75% fires) within the same anchor window. This is the most important in-window state-machine transition and it's currently untested.
Separately, the threshold detection cases ("emits at 75%" and "only emits the highest threshold") share the same structure and are good candidates for a parameterised test per the team's convention — e.g. `it.each([[78, 75], [95, 90], [100, 100], [49, null]])("threshold at %i% fires %i", ...)`.
How can I resolve this? If you propose a fix, please make it concise.
Problem
Users had no visibility into their LLM usage approaching or reaching limits. Without proactive notifications, users would be surprised when hitting rate limits mid-session.
Changes
Introduced a
UsageMonitorServicethat polls the LLM gateway every 30 seconds and emitsthreshold-crossedevents when usage crosses 50%, 75%, 90%, or 100% for either the burst (daily) or sustained (monthly) bucket.Key behaviors:
electron-storeso notifications don't re-fire after an app relaunch within the same billing windowbilling_period_endon the usage responseOn the renderer side,
initializeUsageThresholdToastsubscribes to the newusageMonitor.onThresholdCrossedtRPC subscription and shows:UsageLimitModal(when a session is active) at 100%The old
useUsageLimitDetectionhook and its polling-based approach have been removed in favour of this event-driven model. Thetoast.warningutility was extended to support anactionbutton.How did you test this?
Unit tests cover the core deduplication and emission logic in
UsageMonitorService:isProis correctly derived frombilling_period_endnullwithout throwingPublish to changelog?
no