feat(studio): PostHog behavior + error tracking#52
feat(studio): PostHog behavior + error tracking#52soleil-colza wants to merge 9 commits intomainfrom
Conversation
Studio shipped zero observability; this wires PostHog into the SPA so the alpha cohort's funnel, page reach, and error patterns are visible. - Reuses the CLI's PostHog plumbing: identity (getIdentity), opt-out (DO_NOT_TRACK / ARKOR_TELEMETRY_DISABLED), and build-time key. A new getTelemetryConfig() helper is the single read site. - /api/credentials returns a telemetry block (distinctId, authMode, key, host, sdkVersion, debug); SPA bootstraps from there, no separate Vite define. - New lib/telemetry.ts wraps posthog-js with a dynamic import (true no-op when disabled), pre-init event queue, 200-char message truncation, and global unhandledrejection / error handlers. - Funnel events: studio_train_form_submitted, studio_train_run_subscribed, studio_train_run_completed, studio_train_run_failed (+ subprocess pair). Pageview: studio_page_viewed via useHashRoute. Errors: studio_api_error with redactPath() to bound endpoint cardinality, studio_sse_error for training and inference streams, ErrorBoundary at app root for React exceptions. Surface=studio super-property keeps Studio events separable from CLI events in dashboards. - Privacy: no prompt content, no model output, no dataset content, no raw paths; same anon distinctId as the CLI.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7f220c29ff
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| useEffect(() => { | ||
| trackPageView(route.kind); | ||
| }, [route.kind]); |
There was a problem hiding this comment.
Emit initial pageview only after telemetry init
trackPageView(route.kind) runs in a separate mount effect before fetchCredentials() resolves and before initTelemetry() flips telemetry state to enabled, so the first route view is always dropped. This affects every session that doesn't navigate after load (or before init completes), causing undercounted studio_page_viewed events and skewed funnel/page-reach metrics.
Useful? React with 👍 / 👎.
Greptile SummaryThis PR wires PostHog into the Studio SPA for alpha-cohort observability: the server exposes a Confidence Score: 5/5Safe to merge; all remaining findings are P2 and do not affect user-facing behavior. The three prior P1/P2 issues from previous review rounds are addressed. The one new finding (per-chunk exit-code regex) is a telemetry fidelity gap — exit_code may occasionally be null — but does not affect the UI or correctness of any other event. All 152 tests pass, type-checks clean, and the privacy and no-op-when-disabled guarantees are well-covered. packages/studio-app/src/components/RunTraining.tsx — exit-code accumulation across chunks
|
| Filename | Overview |
|---|---|
| packages/studio-app/src/lib/telemetry.ts | New SPA telemetry wrapper: dynamic import, pre-init queue with 100-item FIFO cap, correct Error-fidelity flush, global handlers. Previously flagged issues (initial pageview drop, queued Error fidelity, _resetForTests export) are all addressed. |
| packages/studio-app/src/lib/api.ts | Adds redactPath() (UUID/hex → :id, query-string stripped) and studio_api_error tracking in apiFetch; silentStatuses opt-out for expected 4xx flows. |
| packages/studio-app/src/App.tsx | Wires initTelemetry / disableTelemetry into credential fetch lifecycle; separate useEffect([route.kind]) for pageviews correctly queues the initial render's event. |
| packages/studio-app/src/components/RunTraining.tsx | Adds Funnel B subprocess events; per-chunk exit-code regex can silently miss the marker at chunk boundaries, leaving exit_code always null. |
| packages/arkor/src/studio/server.ts | Adds telemetry block to /api/credentials response; key is withheld when disabled, identity resolved via getIdentity(). |
| packages/studio-app/src/pages/JobDetail.tsx | Adds Funnel A SSE events with useRef start-time guard; durationMs() helper correct; artifact count and failure reason captured cleanly. |
| packages/studio-app/src/components/ErrorBoundary.tsx | Standard React error boundary; forwards to captureException with component_stack sliced to 500 chars. |
| packages/studio-app/src/main.tsx | Wraps in at the root; straightforward. |
Sequence Diagram
sequenceDiagram
participant App
participant API as /api/credentials
participant Telemetry as lib/telemetry.ts
participant PH as posthog-js (dynamic)
App->>App: mount → trackPageView() queued (pending[])
App->>API: fetchCredentials()
API-->>App: { telemetry: { enabled, posthogKey, … } }
App->>Telemetry: initTelemetry(cfg)
Telemetry->>Telemetry: state.enabled = true
Telemetry->>PH: import('posthog-js')
PH-->>Telemetry: posthog instance
Telemetry->>PH: init / identify / register
Telemetry->>Telemetry: flushQueue() → pageview + any queued events
Note over Telemetry,PH: subsequent track() / captureException() go directly to posthog
App->>API: fetchCredentials() fails
App->>Telemetry: disableTelemetry()
Telemetry->>Telemetry: pending[] cleared, no-op thereafter
Prompt To Fix All With AI
This is a comment left during a code review.
Path: packages/studio-app/src/components/RunTraining.tsx
Line: 371-376
Comment:
**Exit-code extraction silently fails at chunk boundaries**
The regex `chunk.match(/\nexit=(-?\d+)/)` is applied per-chunk, so if the streaming protocol splits `\nexit=0` across two HTTP frames (e.g., one chunk ends with `\nexit=` and the next begins with `0`), the match never fires, `exitCode` remains `null`, and every `studio_train_subprocess_finished` event reports `exit_code: null`. Since the finish event is the primary completion signal for Funnel B, the field would be analytically useless whenever chunks happen to split at that boundary.
Accumulating the stream into a running buffer and applying the regex there (same pattern used for `setLog`) avoids the split-chunk failure mode:
```ts
let accum = "";
await streamTraining((chunk) => {
accum += chunk;
if (!streamStarted) { streamStarted = true; track("studio_train_subprocess_started", {}); }
const match = accum.match(/\nexit=(-?\d+)/);
if (match) exitCode = Number(match[1]);
// ... setLog as before
});
```
How can I resolve this? If you propose a fix, please make it concise.Reviews (6): Last reviewed commit: "Merge branch 'main' into eng-529" | Re-trigger Greptile
…etry queue
Three review fixes to packages/studio-app/src/lib/telemetry.ts.
- track() called before initTelemetry() runs is now queued instead of
dropped. The App's mount-time pageview fires before fetchCredentials
resolves, so under the old logic state.enabled was still false and the
first studio_page_viewed of every session was silently lost. The queue
now buffers pre-init events; init flushes them when enabled and drops
them when disabled. (P2 review badge.)
- Queued exceptions preserve the original Error reference so flush can
call posthog.captureException(err, props) instead of capture("$exception"),
keeping the stack trace for any error thrown during the dynamic
import("posthog-js") window (e.g. ErrorBoundary catching an early crash).
- Removed _resetForTests; vi.resetModules() in load() already gives full
per-test isolation, and shipping a test-only reset to production was
unnecessary surface area.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7d91d1ede3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (!res.ok) { | ||
| track("studio_api_error", { |
There was a problem hiding this comment.
Skip telemetry for expected non-2xx control-flow responses
apiFetch now emits studio_api_error for every non-OK response, but some callers intentionally use 4xx as normal control flow (for example fetchManifest() treats HTTP 400 as a valid { error } payload for “no manifest yet”). In those cases this code records routine behavior as failures, which will systematically inflate API error metrics and hide real regressions in dashboards.
Useful? React with 👍 / 👎.
`redactPath` previously split on `/` without accounting for a scheme, so an absolute URL like `https://host/api/jobs/<uuid>` would tokenise `https:` as a path segment. All current call-sites in apiFetch pass relative paths, so this is latent — the guard prevents future cross-origin or proxy wrappers from silently emitting mangled `endpoint` values to PostHog.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 19d19e17db
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| state.enabled = true; | ||
| state.initPromise = (async () => { | ||
| const mod = await import("posthog-js"); | ||
| const posthog = mod.default; |
There was a problem hiding this comment.
Disable telemetry state when PostHog init fails
If import("posthog-js") or posthog.init(...) throws (for example chunk-load failures, blocked storage, or runtime SDK errors), state.enabled has already been set to true and state.initPromise becomes a rejected promise. In that state, track()/captureException() keep appending to state.pending forever because isDisabledAfterInit() stays false, and init cannot recover on later calls, so long-lived sessions can accumulate an unbounded in-memory queue.
Useful? React with 👍 / 👎.
|
@copilot |
If `import("posthog-js")` or `posthog.init(...)` threw (chunk-load
failure, blocked localStorage, runtime SDK error), state.enabled
stayed true and state.client stayed null, so isDisabledAfterInit()
kept returning false. track() / captureException() then appended to
state.pending forever, growing an unbounded in-memory queue in
long-lived sessions, and the rejected initPromise meant subsequent
init calls couldn't recover either.
Wrap the IIFE body in try/catch: on failure, flip state.enabled back
to false and drop the queue. The init promise resolves cleanly so
fire-and-forget callers (App.tsx's `void initTelemetry(...)`) don't
emit unhandled rejections, and subsequent track()/captureException()
calls become no-ops via the existing isDisabledAfterInit() guard.
There was a problem hiding this comment.
Pull request overview
Adds first-party observability for Arkor Studio by reusing the existing PostHog telemetry identity/config plumbing from the CLI, enabling SPA event tracking and error reporting without introducing a second telemetry provider.
Changes:
- Server exposes a telemetry config block via
/api/credentials(distinctId, auth mode, PostHog key/host, SDK version, debug). - Studio SPA adds a
posthog-jswrapper with dynamic import, pre-init buffering, and global error/unhandledrejection capture (plus a root ReactErrorBoundary). - Adds Studio-specific events for pageviews, training funnels (SSE + subprocess), and API/SSE errors with endpoint redaction.
Reviewed changes
Copilot reviewed 15 out of 16 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| pnpm-lock.yaml | Adds posthog-js and transitive deps to the lockfile. |
| packages/studio-app/package.json | Adds posthog-js dependency for the SPA. |
| packages/studio-app/src/lib/telemetry.ts | New SPA telemetry wrapper: dynamic import, buffering, global error hooks. |
| packages/studio-app/src/lib/telemetry.test.ts | Unit tests for init, buffering, and exception capture behavior. |
| packages/studio-app/src/lib/api.ts | Adds telemetry to credentials type; adds redactPath() and emits studio_api_error on non-OK responses. |
| packages/studio-app/src/lib/api.test.ts | Tests for redactPath() and apiFetch error tracking. |
| packages/studio-app/src/App.tsx | Initializes telemetry after credentials load; tracks pageviews on hash-route changes. |
| packages/studio-app/src/main.tsx | Wraps app in a root ErrorBoundary. |
| packages/studio-app/src/components/ErrorBoundary.tsx | New boundary that forwards React exceptions to telemetry. |
| packages/studio-app/src/components/RunTraining.tsx | Tracks training subprocess funnel events + duration/exit code. |
| packages/studio-app/src/pages/JobDetail.tsx | Tracks training SSE funnel events, completion/failure metadata, and SSE errors. |
| packages/studio-app/src/pages/Playground.tsx | Tracks inference SSE errors. |
| packages/arkor/src/core/telemetry.ts | Adds getTelemetryConfig() helper (enabled/key/host/debug). |
| packages/arkor/src/core/telemetry.test.ts | Tests for getTelemetryConfig() behavior across env flags/host override. |
| packages/arkor/src/studio/server.ts | /api/credentials now includes the telemetry block. |
| packages/arkor/src/studio/server.test.ts | Tests for the /api/credentials telemetry block and env-flag behavior. |
Files not reviewed (1)
- pnpm-lock.yaml: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| export async function initTelemetry(cfg: StudioTelemetryConfig): Promise<void> { | ||
| if (state.initPromise) return state.initPromise; | ||
| if (!cfg.enabled || !cfg.posthogKey) { | ||
| state.enabled = false; | ||
| state.pending = []; | ||
| state.initPromise = Promise.resolve(); | ||
| return state.initPromise; | ||
| } | ||
| state.enabled = true; | ||
| state.initPromise = (async () => { | ||
| try { | ||
| const mod = await import("posthog-js"); | ||
| const posthog = mod.default; | ||
| posthog.init(cfg.posthogKey, { | ||
| api_host: cfg.posthogHost, | ||
| autocapture: false, | ||
| capture_pageview: false, | ||
| capture_pageleave: false, | ||
| disable_session_recording: true, | ||
| persistence: "localStorage", | ||
| loaded: (ph) => { | ||
| if (cfg.debug) ph.debug(); | ||
| }, | ||
| }); | ||
| posthog.identify(cfg.distinctId, { auth_mode: cfg.authMode }); | ||
| posthog.register({ | ||
| sdk_version: cfg.sdkVersion, | ||
| auth_mode: cfg.authMode, | ||
| surface: "studio", | ||
| }); | ||
| state.client = posthog; | ||
| installGlobalHandlers(); | ||
| flushQueue(posthog); | ||
| } catch (err) { | ||
| // Chunk-load failure, blocked storage, or any other SDK init error: |
There was a problem hiding this comment.
initTelemetry can reject if the dynamic import fails (chunk load error, CSP, etc.) or if posthog.init throws. Since callers fire-and-forget (e.g. void initTelemetry(...)), this would surface as an unhandled promise rejection and leave state.enabled=true with state.client=null, causing track()/captureException() to keep queueing indefinitely. Consider wrapping the init body in a try/catch that (1) disables telemetry, (2) clears/limits the pending queue, and (3) resolves the init promise so the app can proceed without noisy rejections.
| const res = await fetch(input, { ...init, headers }); | ||
| if (!res.ok) { | ||
| track("studio_api_error", { | ||
| endpoint: redactPath(input), | ||
| method: init?.method ?? "GET", | ||
| status: res.status, | ||
| status_text: res.statusText, | ||
| }); | ||
| } | ||
| return res; |
There was a problem hiding this comment.
apiFetch currently emits studio_api_error for every non-2xx response. This will include expected/handled 4xx flows like /api/manifest returning 400 (the server explicitly uses 400 so the SPA can render a hint), which will pollute error telemetry and inflate event volume. Consider only tracking 5xx by default, or adding an opt-out/allowlist parameter so callers like fetchManifest() can suppress telemetry for expected statuses.
| import { track } from "./telemetry"; | ||
|
|
||
| export interface StudioTelemetryWire { | ||
| enabled: boolean; | ||
| distinctId: string; | ||
| authMode: "auth0" | "anon" | "none"; | ||
| posthogKey: string; | ||
| posthogHost: string; | ||
| sdkVersion: string; | ||
| debug: boolean; | ||
| } |
There was a problem hiding this comment.
StudioTelemetryWire duplicates the shape of StudioTelemetryConfig in lib/telemetry.ts. Keeping two parallel types for the same payload risks drift (especially as telemetry evolves). Consider exporting a single shared type (or deriving one from the other, e.g. type StudioTelemetryWire = StudioTelemetryConfig) and reusing it here.
| import { track } from "./telemetry"; | |
| export interface StudioTelemetryWire { | |
| enabled: boolean; | |
| distinctId: string; | |
| authMode: "auth0" | "anon" | "none"; | |
| posthogKey: string; | |
| posthogHost: string; | |
| sdkVersion: string; | |
| debug: boolean; | |
| } | |
| import { track, type StudioTelemetryConfig } from "./telemetry"; | |
| export type StudioTelemetryWire = StudioTelemetryConfig; |
apiFetch fired studio_api_error for every non-2xx, including the 400 that /api/manifest uses as a "no manifest yet" hint right after a fresh scaffold. That's an expected flow, not an error, and would dominate the error event volume. Add a `silentStatuses` option on apiFetch and use it from fetchManifest to suppress 400s; real 5xx failures from manifest still surface as errors. Drop StudioTelemetryWire from lib/api.ts and reuse StudioTelemetryConfig from lib/telemetry.ts directly. Two parallel definitions of the same shape would have drifted as the telemetry payload evolved.
|
@copilot |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 70e41d315d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (isDisabledAfterInit()) return; | ||
| state.pending.push({ kind: "event", event, props }); |
There was a problem hiding this comment.
Drop events when telemetry init cannot be reached
track() always appends to state.pending until initTelemetry() runs, but App only calls initTelemetry on a successful /api/credentials fetch (App.tsx), so any startup failure (for example a stale tab after the studio token rotates and /api/credentials returns 403) leaves telemetry permanently uninitialized. In that state, recurring failing calls like the 5-second fetchJobs() poll in JobsList keep emitting studio_api_error and grow this in-memory queue without bound for the life of the tab.
Useful? React with 👍 / 👎.
…strap failure Two reinforcing fixes for unbounded queue growth when initTelemetry() never runs. If /api/credentials fails (e.g. stale studio token after a rotation returns 403), App.tsx's success branch never fires, so initTelemetry is never called and isDisabledAfterInit() returns false forever. Every subsequent failing apiFetch (the 5s JobsList poll, manual reloads, etc.) keeps appending to state.pending for the life of the tab. - Library: cap state.pending at 100 items, FIFO-drop when full. This bounds memory regardless of why init didn't run, and keeps the most recent events (more useful than the oldest). - Library: new disableTelemetry() that tears down state explicitly without trying to load posthog-js. No-op once init has started. - App: call disableTelemetry() in the fetchCredentials catch so subsequent track() calls drop instead of queueing.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 16 changed files in this pull request and generated no new comments.
Files not reviewed (1)
- pnpm-lock.yaml: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f93094f0b9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const match = chunk.match(/\nexit=(-?\d+)/); | ||
| if (match) exitCode = Number(match[1]); |
There was a problem hiding this comment.
Handle split exit markers across stream chunks
exit_code is extracted by running chunk.match(/\nexit=(-?\d+)/) on each chunk independently, but fetch stream chunk boundaries are arbitrary and can split "exit=..." across two chunks. In that case successful/failed runs are reported with exit_code: null, which skews subprocess telemetry and makes completion diagnostics unreliable. Buffering a small tail across callbacks (or parsing from the accumulated log) would make exit-code capture robust.
Useful? React with 👍 / 👎.
The /api/train server writes `\n---\nexit=${code}\n` as the final
controller.enqueue, but the body reader can split that byte sequence
at any boundary. The previous chunk.match(/\nexit=(-?\d+)/) ran the
regex on each chunk in isolation, so when the split landed inside
the marker (e.g. "...\nexit=4" + "2\n") the exit code was lost and
the studio_train_subprocess_finished event reported exit_code: null.
Hold a 64-char rolling tail across chunks and require the trailing
\n in the regex so we only commit once the full digit run has
arrived. Stop running the regex after exitCode is captured.
There was a problem hiding this comment.
soleil-colza has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 16 changed files in this pull request and generated 2 comments.
Files not reviewed (1)
- pnpm-lock.yaml: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const identity = await getIdentity(); | ||
| const tcfg = getTelemetryConfig(); | ||
| return c.json({ | ||
| token, | ||
| mode: creds.mode, | ||
| baseUrl, | ||
| orgSlug: state?.orgSlug ?? (creds.mode === "anon" ? creds.orgSlug : null), | ||
| projectSlug: state?.projectSlug ?? null, | ||
| telemetry: { | ||
| enabled: tcfg.enabled, | ||
| distinctId: identity.distinctId, | ||
| authMode: identity.authMode, | ||
| posthogKey: tcfg.enabled ? tcfg.posthogKey : "", | ||
| posthogHost: tcfg.posthogHost, | ||
| sdkVersion: SDK_VERSION, | ||
| debug: tcfg.debug, | ||
| }, |
There was a problem hiding this comment.
In /api/credentials, getIdentity() is called unconditionally and distinctId/authMode are returned even when telemetry is disabled (e.g. DO_NOT_TRACK=1 or no PostHog key). This can still read/write the persistent telemetry id file and may expose a stable user identifier to the SPA despite opt-out. Consider only computing/returning identity when tcfg.enabled is true, and otherwise return an empty distinctId with authMode: "none" (or omit the identity fields entirely).
| export function redactPath(input: string): string { | ||
| let path: string; | ||
| if (/^[a-z][a-z0-9+.-]*:\/\//i.test(input)) { | ||
| try { | ||
| path = new URL(input).pathname; | ||
| } catch { | ||
| path = input; | ||
| } | ||
| } else { | ||
| const qIdx = input.indexOf("?"); | ||
| path = qIdx === -1 ? input : input.slice(0, qIdx); |
There was a problem hiding this comment.
redactPath() strips query strings for relative paths, but for absolute-looking URLs that fail new URL(input) parsing, the fallback path = input keeps the query string intact. That can defeat the stated “CSRF token leak guard” behavior if an unparseable absolute URL ever includes ?…. Consider stripping the query string in the catch fallback as well (and optionally # fragments) before tokenising segments.
| export function redactPath(input: string): string { | |
| let path: string; | |
| if (/^[a-z][a-z0-9+.-]*:\/\//i.test(input)) { | |
| try { | |
| path = new URL(input).pathname; | |
| } catch { | |
| path = input; | |
| } | |
| } else { | |
| const qIdx = input.indexOf("?"); | |
| path = qIdx === -1 ? input : input.slice(0, qIdx); | |
| function stripQueryAndFragment(input: string): string { | |
| const qIdx = input.indexOf("?"); | |
| const hIdx = input.indexOf("#"); | |
| if (qIdx === -1) return hIdx === -1 ? input : input.slice(0, hIdx); | |
| if (hIdx === -1) return input.slice(0, qIdx); | |
| return input.slice(0, Math.min(qIdx, hIdx)); | |
| } | |
| export function redactPath(input: string): string { | |
| let path: string; | |
| if (/^[a-z][a-z0-9+.-]*:\/\//i.test(input)) { | |
| try { | |
| path = new URL(input).pathname; | |
| } catch { | |
| path = stripQueryAndFragment(input); | |
| } | |
| } else { | |
| path = stripQueryAndFragment(input); |
…fallback Two PR #52 review fixes: server.ts /api/credentials - Previously called getIdentity() unconditionally and returned the resulting distinctId / authMode to the SPA even when telemetry was disabled (DO_NOT_TRACK=1, missing PostHog key, etc.). That both read / wrote the persistent telemetry-id file and exposed a stable user identifier despite opt-out. - Now only call getIdentity() when tcfg.enabled is true; otherwise return inert identity fields (distinctId: "", authMode: "none"). The SPA's initTelemetry early-returns when enabled is false, so the placeholder is never read. - Update the matching server test to assert the inert shape. studio-app/lib/api.ts redactPath - The catch fallback for absolute-looking URLs that fail new URL() parsing kept the input verbatim, which could leak a query string carrying ?studioToken=... into telemetry on malformed URLs. - Strip the query string in the catch branch as well, matching the relative-path branch. - Added a regression test asserting redactPath drops the studioToken query even on unparseable URLs.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3b54de286d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| useEffect(() => { | ||
| trackPageView(route.kind); | ||
| }, [route.kind]); |
There was a problem hiding this comment.
Track pageviews on all route changes
trackPageView is only keyed on route.kind, so navigations that keep the same kind (for example #/jobs/<id1> to #/jobs/<id2> via history/hash edits) do not emit another studio_page_viewed event. This undercounts pageviews for repeated visits within the same route category and skews page-reach/funnel telemetry for job-detail navigation. The effect should depend on the full route (or hash), not only route.kind.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 16 changed files in this pull request and generated 1 comment.
Files not reviewed (1)
- pnpm-lock.yaml: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| es.addEventListener("training.started", (ev: MessageEvent) => { | ||
| setStatus("running"); | ||
| pushRaw(`[started] ${ev.data}`); | ||
| if (startedAtRef.current === null) { | ||
| startedAtRef.current = Date.now(); | ||
| track("studio_train_run_subscribed", { job_id: jobId }); | ||
| } |

Summary
Studio shipped zero observability. This wires PostHog into the SPA so the alpha cohort's training funnel, page reach, and error patterns become visible. Single tool by design: no Sentry, no Langfuse — the CLI already runs PostHog and we reuse its identity, opt-out, and build-time key plumbing rather than operating a second SDK.
getTelemetryConfig()helper incore/telemetry.ts;/api/credentialsreturns atelemetryblock (distinctId,authMode,posthogKey,posthogHost,sdkVersion,debug).lib/telemetry.tswrapsposthog-jswith a dynamic import (true no-op + zero bundle cost when disabled), pre-init event queue, 200-char message truncation, and globalunhandledrejection/errorhandlers.ErrorBoundaryat app root forwards React exceptions.surface=studiosuper-property keeps them separable from CLI events):studio_train_form_submitted→studio_train_run_subscribed→studio_train_run_completed/studio_train_run_failedstudio_train_subprocess_started/_finishedstudio_page_viewed(driven byuseHashRoute)studio_api_errorwithredactPath()(UUID/long-hex →:id, query string stripped to avoid CSRF token leak),studio_sse_errorfor training and inference streamsdistinctIdas the CLI.Test plan
pnpm --filter arkor test(120/120 pass; new cases forgetTelemetryConfigand/api/credentialstelemetry block)pnpm --filter @arkor/studio-app test(32/32 pass; new cases forlib/telemetry,redactPath, andapiFetcherror capture)pnpm --filter @arkor/studio-app typecheckpnpm --filter @arkor/studio-app build(posthog-js code-splits into its own chunk: 192 KB / 64 KB gzip)ARKOR_POSTHOG_KEYis wired in CI: filtersurface = studioin PostHog and confirm pageview, funnel,studio_api_error,studio_sse_error, and ErrorBoundary$exceptionall fire end-to-end. Re-run withDO_NOT_TRACK=1and confirm zero traffic toi.posthog.com.