feat(AUR-276): session-correlation telemetry — candidate-pair logging only by mishanefedov · Pull Request #30 · mishanefedov/agentwatch

mishanefedov · 2026-05-04T15:36:42Z

Summary

Implements AUR-276 — the v1.0 telemetry half of the session-correlation work split (the v1.1 UI half is AUR-277, gated on a manual FP-rate validation against data this PR generates).

The original AUR-115 plan would have shipped a stitched-sessions UI based on a rule-based correlator with no measurement of its false-positive rate. AUR-183 was cancelled 2026-04-15 for that exact reason: without user-labeled pairs to tune against, a ~10–15% FP rate is too high to trust in a flagship feature. This PR closes that gap by silently building the candidate-pair dataset Michael needs to (a) measure FP rate, (b) decide whether AUR-277 ships as-is, with a confirmation hedge, or after a redesign.

Strictly scope-limited. No API field, no React surface, no `/api/sessions/:id` change, no `web/` change. The dataset is queryable via a CLI dump and a dev-only TUI counter; nothing else.

What ships

Schema V3 — `workspace_root` + `git_branch` columns on `sessions`; new `session_link_candidates` table (deliberately candidates, not links — these are unverified). Idempotent migration; re-applying V3 is a no-op.
`src/correlate/` (new module):
- `session-links.ts` — `RecentWritesIndex`: pure data structure, sliding 30-min window, sweep-on-touch, hard cap with 10 % oldest-eviction. Match gate: different agent + different session + same non-null root + same non-null branch + within window.
- `branch-cache.ts` — 60-second TTL around `git rev-parse --abbrev-ref HEAD`, reuses `runGit` + `gitCommonDir` from `src/git/correlate.ts`. Caches null results too (no point re-shelling-out to fail again).
`wrapSinkWithLinks` in `src/store/wire.ts` — layered after `wrapSinkWithStore` at all three sink composition sites (TUI, `serve`, daemon). Errors are warn-once + swallow, mirroring the store wrapper so the observability layer never crashes the agent runtime.
Adapter cwd plumbing — Claude and OpenClaw stamp `details.cwd` on `file_write` events. (Verified empirically: top-level `assistant` / `user` / `attachment` / `system` lines in real Claude JSONL all carry `cwd`. The OpenClaw `sessionCwd` map was already there from `session_start` lines — reused.)
`agentwatch link-candidates` subcommand — `--session ` to scope, `--limit ` to cap. JSON output, no formatting. This is the read path Michael uses to manually classify candidate pairs toward the AUR-277 validation gate.
`AGENTWATCH_DEBUG_LINKS=1` env var — surfaces the candidate-pair count in the TUI Header. Off by default, zero overhead in normal use.

What does NOT ship (deferred to AUR-277, gated)

`/api/sessions/:id` link field
React "Linked sessions" sidebar
Any user-visible stitching outside the dev counter
Cursor write attribution (waits on a Cursor SQLite AI-tracking adapter)

Validation gate before AUR-277

Per AUR-183's cancellation rationale, AUR-277 is blocked on accumulating ≥10 candidate pairs in real self-use, manually classifying each, and either:

FP <5% → ship UI as-is
FP 5–15% → ship UI with thumbs-up/down confirmation hedge
FP >15% → redesign

This PR is the data-collection prerequisite.

Test plan

`npm run typecheck` — clean
`npm test` — 403 tests pass (25 new for AUR-276):
- `src/correlate/session-links.test.ts` (11 cases): every match-gate path (cross-agent + same root + same branch + in-window), every filter-out path (same session, same agent, different root, different branch, null branch, null root, expired window), multi-peer matching, sweep behaviour.
- `src/correlate/branch-cache.test.ts` (5 cases): null/empty cwd skip, TTL hit, TTL expiry re-shells, null branch caching.
- `src/store/sqlite.test.ts` (+9 cases): V3 schema_version, migration idempotency, `upsertSessionWorkspace` first-write-wins + null-no-op, `recordSessionLinkCandidate` insert + bump + canonical pair ordering, `listSessionLinkCandidates` filter, `countSessionLinkCandidates` + `countAllLinkCandidates`.
- `src/store/wire.test.ts` (9 cases, new file): pass-through behaviour for emit/enrich, non-write skip, null-cwd skip, workspace upsert + first-write-wins, candidate-pair recording + repeat-bump, null-branch suppression, regression test that an event going through link→store lands in the events table identically to one going through the store wrapper alone.
Smoke: `npm run build:server` clean.
Smoke: fresh-DB `agentwatch link-candidates` returns `[]`; `--session foo` returns `[]`; `--limit notanumber` exits 2 with a clear error.
Smoke: fresh-DB schema_version is 3; `session_link_candidates` table exists alongside `sessions`.
Post-merge: leave `AGENTWATCH_DEBUG_LINKS=1` set during normal multi-agent use for ≥2 weeks; collect ≥10 candidate pairs; manually classify them and decide whether to start AUR-277.

Rationale doc

`PROGRESS.md` (committed in this PR) captures the why-split decision and the carry-forward map from the original AUR-115 plan into AUR-276.

Linear

AUR-276 — this PR
AUR-277 — follow-up UI ticket (blocked on this PR + the validation gate)
AUR-115 — moved to Backlog as the umbrella; description rewritten to point at the split
AUR-183 — original cancellation, the reason this split exists

… only Splits AUR-115 (which revives the cancelled AUR-183) into two tickets and ships the data-collection half. AUR-183 was cancelled 2026-04-15 because the rule-based stitching had ~10–15% FP without user-labeled pairs to tune against. AUR-115's PROGRESS.md plan was a stricter rule with the same gating gap. AUR-276 closes that gap by silently building the candidate-pair dataset that AUR-277 (the UI half, gated on a measured FP rate) will be validated against. Scope here is telemetry only — no API field, no React surface. - Schema V3: workspace_root + git_branch on sessions; new session_link_candidates table (candidates, not links — they're unverified). - New src/correlate/: RecentWritesIndex (sliding 30-min window, sweep, hard cap with eviction) + branch-cache (60s TTL around git rev-parse, reuses runGit + gitCommonDir). - New wrapSinkWithLinks layered after wrapSinkWithStore at all three sink composition sites (TUI, serve, daemon). Errors warn-once + swallow, mirroring the store wrapper. - Adapters: Claude + OpenClaw stamp details.cwd on file_write events. - agentwatch link-candidates [--session id] [--limit n]: JSON dump for manual classification toward the AUR-277 validation gate. - AGENTWATCH_DEBUG_LINKS=1: surfaces the candidate-pair count in the TUI Header. Off by default, no overhead in normal use. - Tests: 25 new unit + integration cases (branch-cache, RecentWritesIndex, V3 migration + new store methods, linker wrapper + regression test that wrapSinkWithStore behaviour stays unchanged). PROGRESS.md captures the why-split rationale and the carry-forward map from the original AUR-115 plan.

Three material issues from codex review of #30, all fixed: 1. [P1] wrapSinkWithLinks ran processWrite BEFORE inner.emit. Since inner is wrapSinkWithStore in production, store.insert (which fires the AFTER-INSERT trigger that creates the sessions row) ran AFTER upsertSessionWorkspace's UPDATE. Result: every session's first write silently failed to populate workspace_root + git_branch, and any single-write session stayed permanently null. The whole telemetry feature would have no-op'd on first writes — exactly the data AUR-276 needs to collect. Forward to inner FIRST, then process. The pre-fix tests pre-inserted via store.insert before calling linked.emit, masking the bug. Tests now use the production sink composition (linker over store wrapper) so a single emit() exercises the real ordering invariant. 2. [P2] branch-cache keyed by gitCommonDir(cwd), which collapses linked worktrees of the same repo. Two worktrees on different branches shared a cache entry, so a write from worktree-A on `main` poisoned the cache for a write from worktree-B on `feature` for 60s — wrong branch attribution, exactly the false-positive injection AUR-276 exists to measure. Cache now keyed by cwd (per-worktree). Branch query also moved from common-dir to cwd (HEAD lives per-worktree). workspaceRoot return value is still common-dir-resolved, so on-same- branch worktrees of the same repo still collapse for matching. Added regression test. 3. [P2] gitCommonDir was called BEFORE the cache check on every event, so the 60s cache only covered the branch lookup; the common-dir shell-out fired on every cache hit, defeating the cache's hot-path purpose. Cache lookup now precedes any subprocess; common-dir is resolved only on miss and cached alongside the branch. Added regression test asserting zero shell-outs on cache hit. Tests: 404 pass (1 new), typecheck clean.

… only (#30) * feat(AUR-276): session-correlation telemetry — candidate-pair logging only Splits AUR-115 (which revives the cancelled AUR-183) into two tickets and ships the data-collection half. AUR-183 was cancelled 2026-04-15 because the rule-based stitching had ~10–15% FP without user-labeled pairs to tune against. AUR-115's PROGRESS.md plan was a stricter rule with the same gating gap. AUR-276 closes that gap by silently building the candidate-pair dataset that AUR-277 (the UI half, gated on a measured FP rate) will be validated against. Scope here is telemetry only — no API field, no React surface. - Schema V3: workspace_root + git_branch on sessions; new session_link_candidates table (candidates, not links — they're unverified). - New src/correlate/: RecentWritesIndex (sliding 30-min window, sweep, hard cap with eviction) + branch-cache (60s TTL around git rev-parse, reuses runGit + gitCommonDir). - New wrapSinkWithLinks layered after wrapSinkWithStore at all three sink composition sites (TUI, serve, daemon). Errors warn-once + swallow, mirroring the store wrapper. - Adapters: Claude + OpenClaw stamp details.cwd on file_write events. - agentwatch link-candidates [--session id] [--limit n]: JSON dump for manual classification toward the AUR-277 validation gate. - AGENTWATCH_DEBUG_LINKS=1: surfaces the candidate-pair count in the TUI Header. Off by default, no overhead in normal use. - Tests: 25 new unit + integration cases (branch-cache, RecentWritesIndex, V3 migration + new store methods, linker wrapper + regression test that wrapSinkWithStore behaviour stays unchanged). PROGRESS.md captures the why-split rationale and the carry-forward map from the original AUR-115 plan. * fix(AUR-276): codex review — order, worktree key, hot-path subprocess Three material issues from codex review of #30, all fixed: 1. [P1] wrapSinkWithLinks ran processWrite BEFORE inner.emit. Since inner is wrapSinkWithStore in production, store.insert (which fires the AFTER-INSERT trigger that creates the sessions row) ran AFTER upsertSessionWorkspace's UPDATE. Result: every session's first write silently failed to populate workspace_root + git_branch, and any single-write session stayed permanently null. The whole telemetry feature would have no-op'd on first writes — exactly the data AUR-276 needs to collect. Forward to inner FIRST, then process. The pre-fix tests pre-inserted via store.insert before calling linked.emit, masking the bug. Tests now use the production sink composition (linker over store wrapper) so a single emit() exercises the real ordering invariant. 2. [P2] branch-cache keyed by gitCommonDir(cwd), which collapses linked worktrees of the same repo. Two worktrees on different branches shared a cache entry, so a write from worktree-A on `main` poisoned the cache for a write from worktree-B on `feature` for 60s — wrong branch attribution, exactly the false-positive injection AUR-276 exists to measure. Cache now keyed by cwd (per-worktree). Branch query also moved from common-dir to cwd (HEAD lives per-worktree). workspaceRoot return value is still common-dir-resolved, so on-same- branch worktrees of the same repo still collapse for matching. Added regression test. 3. [P2] gitCommonDir was called BEFORE the cache check on every event, so the 60s cache only covered the branch lookup; the common-dir shell-out fired on every cache hit, defeating the cache's hot-path purpose. Cache lookup now precedes any subprocess; common-dir is resolved only on miss and cached alongside the branch. Added regression test asserting zero shell-outs on cache hit. Tests: 404 pass (1 new), typecheck clean.

mishanefedov added 2 commits May 4, 2026 17:35

mishanefedov merged commit 1884a96 into main May 4, 2026
3 checks passed

mishanefedov deleted the misha/aur-276-session-correlation-telemetry branch May 4, 2026 16:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(AUR-276): session-correlation telemetry — candidate-pair logging only#30

feat(AUR-276): session-correlation telemetry — candidate-pair logging only#30
mishanefedov merged 2 commits into
mainfrom
misha/aur-276-session-correlation-telemetry

mishanefedov commented May 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mishanefedov commented May 4, 2026

Summary

What ships

What does NOT ship (deferred to AUR-277, gated)

Validation gate before AUR-277

Test plan

Rationale doc

Linear

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant