fix: move timestamp inside properties in $exception PostHog events#117
fix: move timestamp inside properties in $exception PostHog events#117willwashburn merged 3 commits intomainfrom
Conversation
|
Preview deployed!
This preview shares the staging database and will be cleaned up when the PR is merged or closed. Run E2E testsnpm run e2e -- https://pr117-api.relaycast.dev --ciOpen observer dashboard |
- Added posthog-node@5.29.2 to @relaycast/server dependencies - Created shared posthog.ts module with getPostHogClient() factory - Refactored telemetry.ts to use SDK's capture() instead of raw fetch - Refactored captureException in logger.ts to use SDK's captureException() instead of raw fetch — this was the root cause of the missing field 'type' serde error - Updated telemetry tests to mock the SDK instead of fetch
6117c83 to
abea7c7
Compare
Import flushAllPostHogClients from posthog and remove the dynamic import in flushInternalTelemetryBatchesForTests. This simplifies the test helper and avoids re-export edge cases when flushing PostHog clients.
There was a problem hiding this comment.
Pull request overview
Updates PostHog telemetry/error-tracking to use the posthog-node SDK (instead of raw fetch payloads) to avoid schema/serialization mismatches for $exception and internal telemetry events.
Changes:
- Added a shared PostHog client wrapper with caching and a test flush helper.
- Switched internal telemetry capture (including “batched”) to use the SDK’s capture API.
- Switched
$exceptionreporting in the server logger toposthog-node’s exception capture and updated tests/dependencies accordingly.
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/server/src/lib/telemetry.ts | Replaces manual /capture + custom batching with SDK-based capture and test flush hook. |
| packages/server/src/lib/posthog.ts | Introduces PostHog client factory/cache, telemetry-enabled gating export, and shutdown/flush helper. |
| packages/server/src/lib/logger.ts | Moves $exception reporting to the SDK and adjusts documentation. |
| packages/server/src/lib/tests/telemetry.test.ts | Updates tests to mock the PostHog SDK wrapper instead of global fetch. |
| packages/server/package.json | Adds posthog-node dependency. |
| package-lock.json | Locks posthog-node and its dependency graph. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| await Promise.all([...clients.values()].map(({ client }) => client.shutdown())); | ||
| clients.clear(); |
There was a problem hiding this comment.
flushAllPostHogClients uses Promise.all(...); if any client.shutdown() rejects, the function throws and clients.clear() won’t run, potentially leaking clients across tests. Consider using Promise.allSettled or a try/finally to ensure the map is cleared even on shutdown errors.
| await Promise.all([...clients.values()].map(({ client }) => client.shutdown())); | |
| clients.clear(); | |
| try { | |
| await Promise.all([...clients.values()].map(({ client }) => client.shutdown())); | |
| } finally { | |
| clients.clear(); | |
| } |
| } catch { | ||
| // Best effort only. | ||
| } | ||
| }); |
There was a problem hiding this comment.
captureInternalTelemetry is async but doesn't await/return any SDK flush/shutdown promise. Callers that await this (e.g., Durable Objects) will now proceed immediately, and in Workers this can drop events because nothing keeps the request alive while the SDK flushes. Consider returning a promise tied to delivery (e.g., client.flush()), or expose a dedicated flush/waitUntil-friendly helper so background tasks can keep the isolate alive until PostHog has sent.
| }); | |
| }); | |
| await client.flush(); |
| // Batching is handled automatically by the SDK client (flushAt: 20, flushInterval: 250) | ||
| return captureInternalTelemetry(env, input); |
There was a problem hiding this comment.
captureInternalTelemetryBatched now delegates to captureInternalTelemetry, which resolves immediately and no longer guarantees batched delivery completion. This changes the contract vs. the previous implementation where the returned promise resolved after the batch containing the event was flushed. If any callers rely on the promise to represent “sent (best-effort)” semantics (especially when passed to waitUntil), consider restoring that behavior (e.g., resolve after the SDK flush interval / an explicit flush).
| // Batching is handled automatically by the SDK client (flushAt: 20, flushInterval: 250) | |
| return captureInternalTelemetry(env, input); | |
| // Batching is handled automatically by the SDK client, but callers of this | |
| // helper rely on the returned promise to represent best-effort delivery of | |
| // the batch containing the captured event. | |
| await captureInternalTelemetry(env, input); | |
| await flushAllPostHogClients(); |
| const apiKey = env.POSTHOG_API_KEY; | ||
| if (!apiKey) return; | ||
|
|
||
| const exceptionList = buildExceptionList(error); | ||
| const payload = { | ||
| api_key: apiKey, | ||
| event: '$exception', | ||
| distinct_id: options.distinctId ?? SERVICE_NAME, | ||
| properties: { | ||
| $exception_list: exceptionList, | ||
| $exception_type: exceptionList[0]?.type, | ||
| $exception_message: exceptionList[0]?.value, | ||
| $exception_level: 'error', | ||
| service_name: SERVICE_NAME, | ||
| environment: env.ENVIRONMENT, | ||
| app_version: getAppVersion(env), | ||
| ...(options.properties ?? {}), | ||
| }, | ||
| timestamp: new Date().toISOString(), | ||
| const client = getPostHogClient(env, apiKey); | ||
|
|
There was a problem hiding this comment.
captureException does not honor the telemetry opt-out env vars (DO_NOT_TRACK / RELAYCAST_TELEMETRY_DISABLED). TELEMETRY.md lists packages/server/src/lib/logger.ts as a telemetry source, so this function should likely skip sending when telemetry is disabled (same gating as captureInternalTelemetry).
| export async function captureException( | ||
| env: CloudflareBindings, | ||
| error: unknown, | ||
| options: CaptureExceptionOptions = {}, | ||
| ): Promise<void> { | ||
| const apiKey = env.POSTHOG_API_KEY; | ||
| if (!apiKey) return; | ||
|
|
||
| const exceptionList = buildExceptionList(error); | ||
| const payload = { | ||
| api_key: apiKey, | ||
| event: '$exception', | ||
| distinct_id: options.distinctId ?? SERVICE_NAME, | ||
| properties: { | ||
| $exception_list: exceptionList, | ||
| $exception_type: exceptionList[0]?.type, | ||
| $exception_message: exceptionList[0]?.value, | ||
| $exception_level: 'error', | ||
| service_name: SERVICE_NAME, | ||
| environment: env.ENVIRONMENT, | ||
| app_version: getAppVersion(env), | ||
| ...(options.properties ?? {}), | ||
| }, | ||
| timestamp: new Date().toISOString(), | ||
| const client = getPostHogClient(env, apiKey); | ||
|
|
||
| // Build a flat properties object for the SDK's captureException. | ||
| // The SDK handles setting event: '$exception' and proper serialization. | ||
| const additionalProperties: Record<string, unknown> = { | ||
| $exception_list: exceptionList, | ||
| $exception_type: exceptionList[0]?.type, | ||
| $exception_message: exceptionList[0]?.value, | ||
| $exception_level: 'error', | ||
| service_name: SERVICE_NAME, | ||
| environment: env.ENVIRONMENT, | ||
| app_version: getAppVersion(env), | ||
| ...(options.properties ?? {}), | ||
| }; | ||
|
|
||
| try { | ||
| await fetch(`${getPostHogHost(env)}/capture/`, { | ||
| method: 'POST', | ||
| headers: { 'Content-Type': 'application/json' }, | ||
| body: JSON.stringify(payload), | ||
| }); | ||
| } catch { | ||
| // Best effort — never break request handling for telemetry. | ||
| } | ||
| client.captureException(error, options.distinctId ?? SERVICE_NAME, additionalProperties); | ||
| } |
There was a problem hiding this comment.
captureException is async but doesn’t await anything; it returns a promise that resolves immediately after enqueueing. Several call sites (worker.ts) await it and/or pass it to waitUntil expecting best-effort completion before the request/task ends. Consider returning/awaiting an SDK flush promise (or otherwise tying the returned promise to delivery) so these call sites still provide the intended reliability.
Check telemetryEnabled before sending exceptions and return early when disabled. Use client.flush() (awaited in a try/catch) to ensure events are flushed so callers can pass the promise to waitUntil without impacting request handling. Also harden flushAllPostHogClients by clearing the clients map in a finally block so clients are released even if shutdown rejects.
Summary
The
captureExceptionfunction inpackages/server/src/lib/logger.tswas placingtimestampat the top level of the PostHog capture payload instead of insideproperties.PostHog's capture endpoint expects
timestamp(orts) as a property field, not a top-level field. This caused events to be rejected with:Fix
Moved
timestampfrom the top-level payload intopropertiesso it aligns with PostHog's expected schema.Testing
All 436 tests pass.
Fixes
Invalid properties on event ... serde error: missing field