Skip to content

feat(AUR-276): session-correlation telemetry — candidate-pair logging only#30

Merged
mishanefedov merged 2 commits into
mainfrom
misha/aur-276-session-correlation-telemetry
May 4, 2026
Merged

feat(AUR-276): session-correlation telemetry — candidate-pair logging only#30
mishanefedov merged 2 commits into
mainfrom
misha/aur-276-session-correlation-telemetry

Conversation

@mishanefedov
Copy link
Copy Markdown
Owner

Summary

Implements AUR-276 — the v1.0 telemetry half of the session-correlation work split (the v1.1 UI half is AUR-277, gated on a manual FP-rate validation against data this PR generates).

The original AUR-115 plan would have shipped a stitched-sessions UI based on a rule-based correlator with no measurement of its false-positive rate. AUR-183 was cancelled 2026-04-15 for that exact reason: without user-labeled pairs to tune against, a ~10–15% FP rate is too high to trust in a flagship feature. This PR closes that gap by silently building the candidate-pair dataset Michael needs to (a) measure FP rate, (b) decide whether AUR-277 ships as-is, with a confirmation hedge, or after a redesign.

Strictly scope-limited. No API field, no React surface, no `/api/sessions/:id` change, no `web/` change. The dataset is queryable via a CLI dump and a dev-only TUI counter; nothing else.

What ships

  • Schema V3 — `workspace_root` + `git_branch` columns on `sessions`; new `session_link_candidates` table (deliberately candidates, not links — these are unverified). Idempotent migration; re-applying V3 is a no-op.
  • `src/correlate/` (new module):
    • `session-links.ts` — `RecentWritesIndex`: pure data structure, sliding 30-min window, sweep-on-touch, hard cap with 10 % oldest-eviction. Match gate: different agent + different session + same non-null root + same non-null branch + within window.
    • `branch-cache.ts` — 60-second TTL around `git rev-parse --abbrev-ref HEAD`, reuses `runGit` + `gitCommonDir` from `src/git/correlate.ts`. Caches null results too (no point re-shelling-out to fail again).
  • `wrapSinkWithLinks` in `src/store/wire.ts` — layered after `wrapSinkWithStore` at all three sink composition sites (TUI, `serve`, daemon). Errors are warn-once + swallow, mirroring the store wrapper so the observability layer never crashes the agent runtime.
  • Adapter cwd plumbing — Claude and OpenClaw stamp `details.cwd` on `file_write` events. (Verified empirically: top-level `assistant` / `user` / `attachment` / `system` lines in real Claude JSONL all carry `cwd`. The OpenClaw `sessionCwd` map was already there from `session_start` lines — reused.)
  • `agentwatch link-candidates` subcommand — `--session ` to scope, `--limit ` to cap. JSON output, no formatting. This is the read path Michael uses to manually classify candidate pairs toward the AUR-277 validation gate.
  • `AGENTWATCH_DEBUG_LINKS=1` env var — surfaces the candidate-pair count in the TUI Header. Off by default, zero overhead in normal use.

What does NOT ship (deferred to AUR-277, gated)

  • `/api/sessions/:id` link field
  • React "Linked sessions" sidebar
  • Any user-visible stitching outside the dev counter
  • Cursor write attribution (waits on a Cursor SQLite AI-tracking adapter)

Validation gate before AUR-277

Per AUR-183's cancellation rationale, AUR-277 is blocked on accumulating ≥10 candidate pairs in real self-use, manually classifying each, and either:

  • FP <5% → ship UI as-is
  • FP 5–15% → ship UI with thumbs-up/down confirmation hedge
  • FP >15% → redesign

This PR is the data-collection prerequisite.

Test plan

  • `npm run typecheck` — clean
  • `npm test` — 403 tests pass (25 new for AUR-276):
    • `src/correlate/session-links.test.ts` (11 cases): every match-gate path (cross-agent + same root + same branch + in-window), every filter-out path (same session, same agent, different root, different branch, null branch, null root, expired window), multi-peer matching, sweep behaviour.
    • `src/correlate/branch-cache.test.ts` (5 cases): null/empty cwd skip, TTL hit, TTL expiry re-shells, null branch caching.
    • `src/store/sqlite.test.ts` (+9 cases): V3 schema_version, migration idempotency, `upsertSessionWorkspace` first-write-wins + null-no-op, `recordSessionLinkCandidate` insert + bump + canonical pair ordering, `listSessionLinkCandidates` filter, `countSessionLinkCandidates` + `countAllLinkCandidates`.
    • `src/store/wire.test.ts` (9 cases, new file): pass-through behaviour for emit/enrich, non-write skip, null-cwd skip, workspace upsert + first-write-wins, candidate-pair recording + repeat-bump, null-branch suppression, regression test that an event going through link→store lands in the events table identically to one going through the store wrapper alone.
  • Smoke: `npm run build:server` clean.
  • Smoke: fresh-DB `agentwatch link-candidates` returns `[]`; `--session foo` returns `[]`; `--limit notanumber` exits 2 with a clear error.
  • Smoke: fresh-DB schema_version is 3; `session_link_candidates` table exists alongside `sessions`.
  • Post-merge: leave `AGENTWATCH_DEBUG_LINKS=1` set during normal multi-agent use for ≥2 weeks; collect ≥10 candidate pairs; manually classify them and decide whether to start AUR-277.

Rationale doc

`PROGRESS.md` (committed in this PR) captures the why-split decision and the carry-forward map from the original AUR-115 plan into AUR-276.

Linear

  • AUR-276 — this PR
  • AUR-277 — follow-up UI ticket (blocked on this PR + the validation gate)
  • AUR-115 — moved to Backlog as the umbrella; description rewritten to point at the split
  • AUR-183 — original cancellation, the reason this split exists

… only

Splits AUR-115 (which revives the cancelled AUR-183) into two tickets and
ships the data-collection half. AUR-183 was cancelled 2026-04-15 because
the rule-based stitching had ~10–15% FP without user-labeled pairs to
tune against. AUR-115's PROGRESS.md plan was a stricter rule with the
same gating gap. AUR-276 closes that gap by silently building the
candidate-pair dataset that AUR-277 (the UI half, gated on a measured
FP rate) will be validated against.

Scope here is telemetry only — no API field, no React surface.

- Schema V3: workspace_root + git_branch on sessions; new
  session_link_candidates table (candidates, not links — they're unverified).
- New src/correlate/: RecentWritesIndex (sliding 30-min window, sweep,
  hard cap with eviction) + branch-cache (60s TTL around git rev-parse,
  reuses runGit + gitCommonDir).
- New wrapSinkWithLinks layered after wrapSinkWithStore at all three sink
  composition sites (TUI, serve, daemon). Errors warn-once + swallow,
  mirroring the store wrapper.
- Adapters: Claude + OpenClaw stamp details.cwd on file_write events.
- agentwatch link-candidates [--session id] [--limit n]: JSON dump for
  manual classification toward the AUR-277 validation gate.
- AGENTWATCH_DEBUG_LINKS=1: surfaces the candidate-pair count in the TUI
  Header. Off by default, no overhead in normal use.
- Tests: 25 new unit + integration cases (branch-cache, RecentWritesIndex,
  V3 migration + new store methods, linker wrapper + regression test that
  wrapSinkWithStore behaviour stays unchanged).

PROGRESS.md captures the why-split rationale and the carry-forward map
from the original AUR-115 plan.
Three material issues from codex review of #30, all fixed:

1. [P1] wrapSinkWithLinks ran processWrite BEFORE inner.emit. Since
   inner is wrapSinkWithStore in production, store.insert (which fires
   the AFTER-INSERT trigger that creates the sessions row) ran AFTER
   upsertSessionWorkspace's UPDATE. Result: every session's first write
   silently failed to populate workspace_root + git_branch, and any
   single-write session stayed permanently null. The whole telemetry
   feature would have no-op'd on first writes — exactly the data
   AUR-276 needs to collect. Forward to inner FIRST, then process.

   The pre-fix tests pre-inserted via store.insert before calling
   linked.emit, masking the bug. Tests now use the production sink
   composition (linker over store wrapper) so a single emit() exercises
   the real ordering invariant.

2. [P2] branch-cache keyed by gitCommonDir(cwd), which collapses linked
   worktrees of the same repo. Two worktrees on different branches
   shared a cache entry, so a write from worktree-A on `main` poisoned
   the cache for a write from worktree-B on `feature` for 60s — wrong
   branch attribution, exactly the false-positive injection AUR-276
   exists to measure. Cache now keyed by cwd (per-worktree). Branch
   query also moved from common-dir to cwd (HEAD lives per-worktree).
   workspaceRoot return value is still common-dir-resolved, so on-same-
   branch worktrees of the same repo still collapse for matching.
   Added regression test.

3. [P2] gitCommonDir was called BEFORE the cache check on every event,
   so the 60s cache only covered the branch lookup; the common-dir
   shell-out fired on every cache hit, defeating the cache's hot-path
   purpose. Cache lookup now precedes any subprocess; common-dir is
   resolved only on miss and cached alongside the branch. Added
   regression test asserting zero shell-outs on cache hit.

Tests: 404 pass (1 new), typecheck clean.
@mishanefedov mishanefedov merged commit 1884a96 into main May 4, 2026
3 checks passed
@mishanefedov mishanefedov deleted the misha/aur-276-session-correlation-telemetry branch May 4, 2026 16:37
mishanefedov added a commit that referenced this pull request May 25, 2026
… only (#30)

* feat(AUR-276): session-correlation telemetry — candidate-pair logging only

Splits AUR-115 (which revives the cancelled AUR-183) into two tickets and
ships the data-collection half. AUR-183 was cancelled 2026-04-15 because
the rule-based stitching had ~10–15% FP without user-labeled pairs to
tune against. AUR-115's PROGRESS.md plan was a stricter rule with the
same gating gap. AUR-276 closes that gap by silently building the
candidate-pair dataset that AUR-277 (the UI half, gated on a measured
FP rate) will be validated against.

Scope here is telemetry only — no API field, no React surface.

- Schema V3: workspace_root + git_branch on sessions; new
  session_link_candidates table (candidates, not links — they're unverified).
- New src/correlate/: RecentWritesIndex (sliding 30-min window, sweep,
  hard cap with eviction) + branch-cache (60s TTL around git rev-parse,
  reuses runGit + gitCommonDir).
- New wrapSinkWithLinks layered after wrapSinkWithStore at all three sink
  composition sites (TUI, serve, daemon). Errors warn-once + swallow,
  mirroring the store wrapper.
- Adapters: Claude + OpenClaw stamp details.cwd on file_write events.
- agentwatch link-candidates [--session id] [--limit n]: JSON dump for
  manual classification toward the AUR-277 validation gate.
- AGENTWATCH_DEBUG_LINKS=1: surfaces the candidate-pair count in the TUI
  Header. Off by default, no overhead in normal use.
- Tests: 25 new unit + integration cases (branch-cache, RecentWritesIndex,
  V3 migration + new store methods, linker wrapper + regression test that
  wrapSinkWithStore behaviour stays unchanged).

PROGRESS.md captures the why-split rationale and the carry-forward map
from the original AUR-115 plan.

* fix(AUR-276): codex review — order, worktree key, hot-path subprocess

Three material issues from codex review of #30, all fixed:

1. [P1] wrapSinkWithLinks ran processWrite BEFORE inner.emit. Since
   inner is wrapSinkWithStore in production, store.insert (which fires
   the AFTER-INSERT trigger that creates the sessions row) ran AFTER
   upsertSessionWorkspace's UPDATE. Result: every session's first write
   silently failed to populate workspace_root + git_branch, and any
   single-write session stayed permanently null. The whole telemetry
   feature would have no-op'd on first writes — exactly the data
   AUR-276 needs to collect. Forward to inner FIRST, then process.

   The pre-fix tests pre-inserted via store.insert before calling
   linked.emit, masking the bug. Tests now use the production sink
   composition (linker over store wrapper) so a single emit() exercises
   the real ordering invariant.

2. [P2] branch-cache keyed by gitCommonDir(cwd), which collapses linked
   worktrees of the same repo. Two worktrees on different branches
   shared a cache entry, so a write from worktree-A on `main` poisoned
   the cache for a write from worktree-B on `feature` for 60s — wrong
   branch attribution, exactly the false-positive injection AUR-276
   exists to measure. Cache now keyed by cwd (per-worktree). Branch
   query also moved from common-dir to cwd (HEAD lives per-worktree).
   workspaceRoot return value is still common-dir-resolved, so on-same-
   branch worktrees of the same repo still collapse for matching.
   Added regression test.

3. [P2] gitCommonDir was called BEFORE the cache check on every event,
   so the 60s cache only covered the branch lookup; the common-dir
   shell-out fired on every cache hit, defeating the cache's hot-path
   purpose. Cache lookup now precedes any subprocess; common-dir is
   resolved only on miss and cached alongside the branch. Added
   regression test asserting zero shell-outs on cache hit.

Tests: 404 pass (1 new), typecheck clean.
mishanefedov added a commit that referenced this pull request May 25, 2026
… only (#30)

* feat(AUR-276): session-correlation telemetry — candidate-pair logging only

Splits AUR-115 (which revives the cancelled AUR-183) into two tickets and
ships the data-collection half. AUR-183 was cancelled 2026-04-15 because
the rule-based stitching had ~10–15% FP without user-labeled pairs to
tune against. AUR-115's PROGRESS.md plan was a stricter rule with the
same gating gap. AUR-276 closes that gap by silently building the
candidate-pair dataset that AUR-277 (the UI half, gated on a measured
FP rate) will be validated against.

Scope here is telemetry only — no API field, no React surface.

- Schema V3: workspace_root + git_branch on sessions; new
  session_link_candidates table (candidates, not links — they're unverified).
- New src/correlate/: RecentWritesIndex (sliding 30-min window, sweep,
  hard cap with eviction) + branch-cache (60s TTL around git rev-parse,
  reuses runGit + gitCommonDir).
- New wrapSinkWithLinks layered after wrapSinkWithStore at all three sink
  composition sites (TUI, serve, daemon). Errors warn-once + swallow,
  mirroring the store wrapper.
- Adapters: Claude + OpenClaw stamp details.cwd on file_write events.
- agentwatch link-candidates [--session id] [--limit n]: JSON dump for
  manual classification toward the AUR-277 validation gate.
- AGENTWATCH_DEBUG_LINKS=1: surfaces the candidate-pair count in the TUI
  Header. Off by default, no overhead in normal use.
- Tests: 25 new unit + integration cases (branch-cache, RecentWritesIndex,
  V3 migration + new store methods, linker wrapper + regression test that
  wrapSinkWithStore behaviour stays unchanged).

PROGRESS.md captures the why-split rationale and the carry-forward map
from the original AUR-115 plan.

* fix(AUR-276): codex review — order, worktree key, hot-path subprocess

Three material issues from codex review of #30, all fixed:

1. [P1] wrapSinkWithLinks ran processWrite BEFORE inner.emit. Since
   inner is wrapSinkWithStore in production, store.insert (which fires
   the AFTER-INSERT trigger that creates the sessions row) ran AFTER
   upsertSessionWorkspace's UPDATE. Result: every session's first write
   silently failed to populate workspace_root + git_branch, and any
   single-write session stayed permanently null. The whole telemetry
   feature would have no-op'd on first writes — exactly the data
   AUR-276 needs to collect. Forward to inner FIRST, then process.

   The pre-fix tests pre-inserted via store.insert before calling
   linked.emit, masking the bug. Tests now use the production sink
   composition (linker over store wrapper) so a single emit() exercises
   the real ordering invariant.

2. [P2] branch-cache keyed by gitCommonDir(cwd), which collapses linked
   worktrees of the same repo. Two worktrees on different branches
   shared a cache entry, so a write from worktree-A on `main` poisoned
   the cache for a write from worktree-B on `feature` for 60s — wrong
   branch attribution, exactly the false-positive injection AUR-276
   exists to measure. Cache now keyed by cwd (per-worktree). Branch
   query also moved from common-dir to cwd (HEAD lives per-worktree).
   workspaceRoot return value is still common-dir-resolved, so on-same-
   branch worktrees of the same repo still collapse for matching.
   Added regression test.

3. [P2] gitCommonDir was called BEFORE the cache check on every event,
   so the 60s cache only covered the branch lookup; the common-dir
   shell-out fired on every cache hit, defeating the cache's hot-path
   purpose. Cache lookup now precedes any subprocess; common-dir is
   resolved only on miss and cached alongside the branch. Added
   regression test asserting zero shell-outs on cache hit.

Tests: 404 pass (1 new), typecheck clean.
mishanefedov added a commit that referenced this pull request May 25, 2026
… only (#30)

* feat(AUR-276): session-correlation telemetry — candidate-pair logging only

Splits AUR-115 (which revives the cancelled AUR-183) into two tickets and
ships the data-collection half. AUR-183 was cancelled 2026-04-15 because
the rule-based stitching had ~10–15% FP without user-labeled pairs to
tune against. AUR-115's PROGRESS.md plan was a stricter rule with the
same gating gap. AUR-276 closes that gap by silently building the
candidate-pair dataset that AUR-277 (the UI half, gated on a measured
FP rate) will be validated against.

Scope here is telemetry only — no API field, no React surface.

- Schema V3: workspace_root + git_branch on sessions; new
  session_link_candidates table (candidates, not links — they're unverified).
- New src/correlate/: RecentWritesIndex (sliding 30-min window, sweep,
  hard cap with eviction) + branch-cache (60s TTL around git rev-parse,
  reuses runGit + gitCommonDir).
- New wrapSinkWithLinks layered after wrapSinkWithStore at all three sink
  composition sites (TUI, serve, daemon). Errors warn-once + swallow,
  mirroring the store wrapper.
- Adapters: Claude + OpenClaw stamp details.cwd on file_write events.
- agentwatch link-candidates [--session id] [--limit n]: JSON dump for
  manual classification toward the AUR-277 validation gate.
- AGENTWATCH_DEBUG_LINKS=1: surfaces the candidate-pair count in the TUI
  Header. Off by default, no overhead in normal use.
- Tests: 25 new unit + integration cases (branch-cache, RecentWritesIndex,
  V3 migration + new store methods, linker wrapper + regression test that
  wrapSinkWithStore behaviour stays unchanged).

PROGRESS.md captures the why-split rationale and the carry-forward map
from the original AUR-115 plan.

* fix(AUR-276): codex review — order, worktree key, hot-path subprocess

Three material issues from codex review of #30, all fixed:

1. [P1] wrapSinkWithLinks ran processWrite BEFORE inner.emit. Since
   inner is wrapSinkWithStore in production, store.insert (which fires
   the AFTER-INSERT trigger that creates the sessions row) ran AFTER
   upsertSessionWorkspace's UPDATE. Result: every session's first write
   silently failed to populate workspace_root + git_branch, and any
   single-write session stayed permanently null. The whole telemetry
   feature would have no-op'd on first writes — exactly the data
   AUR-276 needs to collect. Forward to inner FIRST, then process.

   The pre-fix tests pre-inserted via store.insert before calling
   linked.emit, masking the bug. Tests now use the production sink
   composition (linker over store wrapper) so a single emit() exercises
   the real ordering invariant.

2. [P2] branch-cache keyed by gitCommonDir(cwd), which collapses linked
   worktrees of the same repo. Two worktrees on different branches
   shared a cache entry, so a write from worktree-A on `main` poisoned
   the cache for a write from worktree-B on `feature` for 60s — wrong
   branch attribution, exactly the false-positive injection AUR-276
   exists to measure. Cache now keyed by cwd (per-worktree). Branch
   query also moved from common-dir to cwd (HEAD lives per-worktree).
   workspaceRoot return value is still common-dir-resolved, so on-same-
   branch worktrees of the same repo still collapse for matching.
   Added regression test.

3. [P2] gitCommonDir was called BEFORE the cache check on every event,
   so the 60s cache only covered the branch lookup; the common-dir
   shell-out fired on every cache hit, defeating the cache's hot-path
   purpose. Cache lookup now precedes any subprocess; common-dir is
   resolved only on miss and cached alongside the branch. Added
   regression test asserting zero shell-outs on cache hit.

Tests: 404 pass (1 new), typecheck clean.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant