Skip to content

fix: move timestamp inside properties in $exception PostHog events#117

Merged
willwashburn merged 3 commits intomainfrom
fix/posthog-exception-missing-type
Apr 19, 2026
Merged

fix: move timestamp inside properties in $exception PostHog events#117
willwashburn merged 3 commits intomainfrom
fix/posthog-exception-missing-type

Conversation

@barryonthecape
Copy link
Copy Markdown
Contributor

Summary

The captureException function in packages/server/src/lib/logger.ts was placing timestamp at the top level of the PostHog capture payload instead of inside properties.

PostHog's capture endpoint expects timestamp (or ts) as a property field, not a top-level field. This caused events to be rejected with:

serde error: missing field `type`

Fix

Moved timestamp from the top-level payload into properties so it aligns with PostHog's expected schema.

Testing

All 436 tests pass.

Fixes Invalid properties on event ... serde error: missing field

@github-actions
Copy link
Copy Markdown

Preview deployed!

Environment URL
API https://pr117-api.relaycast.dev
Health https://pr117-api.relaycast.dev/health
Observer https://pr117-observer.relaycast.dev/observer

This preview shares the staging database and will be cleaned up when the PR is merged or closed.

Run E2E tests

npm run e2e -- https://pr117-api.relaycast.dev --ci

Open observer dashboard

https://pr117-observer.relaycast.dev/observer

- Added posthog-node@5.29.2 to @relaycast/server dependencies
- Created shared posthog.ts module with getPostHogClient() factory
- Refactored telemetry.ts to use SDK's capture() instead of raw fetch
- Refactored captureException in logger.ts to use SDK's captureException()
  instead of raw fetch — this was the root cause of the missing field
  'type' serde error
- Updated telemetry tests to mock the SDK instead of fetch
@barryonthecape barryonthecape force-pushed the fix/posthog-exception-missing-type branch from 6117c83 to abea7c7 Compare April 13, 2026 21:54
Import flushAllPostHogClients from posthog and remove the dynamic import in flushInternalTelemetryBatchesForTests. This simplifies the test helper and avoids re-export edge cases when flushing PostHog clients.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates PostHog telemetry/error-tracking to use the posthog-node SDK (instead of raw fetch payloads) to avoid schema/serialization mismatches for $exception and internal telemetry events.

Changes:

  • Added a shared PostHog client wrapper with caching and a test flush helper.
  • Switched internal telemetry capture (including “batched”) to use the SDK’s capture API.
  • Switched $exception reporting in the server logger to posthog-node’s exception capture and updated tests/dependencies accordingly.

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
packages/server/src/lib/telemetry.ts Replaces manual /capture + custom batching with SDK-based capture and test flush hook.
packages/server/src/lib/posthog.ts Introduces PostHog client factory/cache, telemetry-enabled gating export, and shutdown/flush helper.
packages/server/src/lib/logger.ts Moves $exception reporting to the SDK and adjusts documentation.
packages/server/src/lib/tests/telemetry.test.ts Updates tests to mock the PostHog SDK wrapper instead of global fetch.
packages/server/package.json Adds posthog-node dependency.
package-lock.json Locks posthog-node and its dependency graph.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/server/src/lib/posthog.ts Outdated
Comment on lines +45 to +46
await Promise.all([...clients.values()].map(({ client }) => client.shutdown()));
clients.clear();
Copy link

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flushAllPostHogClients uses Promise.all(...); if any client.shutdown() rejects, the function throws and clients.clear() won’t run, potentially leaking clients across tests. Consider using Promise.allSettled or a try/finally to ensure the map is cleared even on shutdown errors.

Suggested change
await Promise.all([...clients.values()].map(({ client }) => client.shutdown()));
clients.clear();
try {
await Promise.all([...clients.values()].map(({ client }) => client.shutdown()));
} finally {
clients.clear();
}

Copilot uses AI. Check for mistakes.
} catch {
// Best effort only.
}
});
Copy link

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

captureInternalTelemetry is async but doesn't await/return any SDK flush/shutdown promise. Callers that await this (e.g., Durable Objects) will now proceed immediately, and in Workers this can drop events because nothing keeps the request alive while the SDK flushes. Consider returning a promise tied to delivery (e.g., client.flush()), or expose a dedicated flush/waitUntil-friendly helper so background tasks can keep the isolate alive until PostHog has sent.

Suggested change
});
});
await client.flush();

Copilot uses AI. Check for mistakes.
Comment on lines +60 to +61
// Batching is handled automatically by the SDK client (flushAt: 20, flushInterval: 250)
return captureInternalTelemetry(env, input);
Copy link

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

captureInternalTelemetryBatched now delegates to captureInternalTelemetry, which resolves immediately and no longer guarantees batched delivery completion. This changes the contract vs. the previous implementation where the returned promise resolved after the batch containing the event was flushed. If any callers rely on the promise to represent “sent (best-effort)” semantics (especially when passed to waitUntil), consider restoring that behavior (e.g., resolve after the SDK flush interval / an explicit flush).

Suggested change
// Batching is handled automatically by the SDK client (flushAt: 20, flushInterval: 250)
return captureInternalTelemetry(env, input);
// Batching is handled automatically by the SDK client, but callers of this
// helper rely on the returned promise to represent best-effort delivery of
// the batch containing the captured event.
await captureInternalTelemetry(env, input);
await flushAllPostHogClients();

Copilot uses AI. Check for mistakes.
Comment on lines 335 to +340
const apiKey = env.POSTHOG_API_KEY;
if (!apiKey) return;

const exceptionList = buildExceptionList(error);
const payload = {
api_key: apiKey,
event: '$exception',
distinct_id: options.distinctId ?? SERVICE_NAME,
properties: {
$exception_list: exceptionList,
$exception_type: exceptionList[0]?.type,
$exception_message: exceptionList[0]?.value,
$exception_level: 'error',
service_name: SERVICE_NAME,
environment: env.ENVIRONMENT,
app_version: getAppVersion(env),
...(options.properties ?? {}),
},
timestamp: new Date().toISOString(),
const client = getPostHogClient(env, apiKey);

Copy link

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

captureException does not honor the telemetry opt-out env vars (DO_NOT_TRACK / RELAYCAST_TELEMETRY_DISABLED). TELEMETRY.md lists packages/server/src/lib/logger.ts as a telemetry source, so this function should likely skip sending when telemetry is disabled (same gating as captureInternalTelemetry).

Copilot uses AI. Check for mistakes.
Comment on lines 330 to 355
export async function captureException(
env: CloudflareBindings,
error: unknown,
options: CaptureExceptionOptions = {},
): Promise<void> {
const apiKey = env.POSTHOG_API_KEY;
if (!apiKey) return;

const exceptionList = buildExceptionList(error);
const payload = {
api_key: apiKey,
event: '$exception',
distinct_id: options.distinctId ?? SERVICE_NAME,
properties: {
$exception_list: exceptionList,
$exception_type: exceptionList[0]?.type,
$exception_message: exceptionList[0]?.value,
$exception_level: 'error',
service_name: SERVICE_NAME,
environment: env.ENVIRONMENT,
app_version: getAppVersion(env),
...(options.properties ?? {}),
},
timestamp: new Date().toISOString(),
const client = getPostHogClient(env, apiKey);

// Build a flat properties object for the SDK's captureException.
// The SDK handles setting event: '$exception' and proper serialization.
const additionalProperties: Record<string, unknown> = {
$exception_list: exceptionList,
$exception_type: exceptionList[0]?.type,
$exception_message: exceptionList[0]?.value,
$exception_level: 'error',
service_name: SERVICE_NAME,
environment: env.ENVIRONMENT,
app_version: getAppVersion(env),
...(options.properties ?? {}),
};

try {
await fetch(`${getPostHogHost(env)}/capture/`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(payload),
});
} catch {
// Best effort — never break request handling for telemetry.
}
client.captureException(error, options.distinctId ?? SERVICE_NAME, additionalProperties);
}
Copy link

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

captureException is async but doesn’t await anything; it returns a promise that resolves immediately after enqueueing. Several call sites (worker.ts) await it and/or pass it to waitUntil expecting best-effort completion before the request/task ends. Consider returning/awaiting an SDK flush promise (or otherwise tying the returned promise to delivery) so these call sites still provide the intended reliability.

Copilot uses AI. Check for mistakes.
Comment thread packages/server/src/lib/logger.ts
Check telemetryEnabled before sending exceptions and return early when disabled. Use client.flush() (awaited in a try/catch) to ensure events are flushed so callers can pass the promise to waitUntil without impacting request handling. Also harden flushAllPostHogClients by clearing the clients map in a finally block so clients are released even if shutdown rejects.
@willwashburn willwashburn merged commit 650f850 into main Apr 19, 2026
4 checks passed
@willwashburn willwashburn deleted the fix/posthog-exception-missing-type branch April 19, 2026 18:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants