feat(AUR-276): session-correlation telemetry — candidate-pair logging only#30
Merged
Merged
Conversation
… only Splits AUR-115 (which revives the cancelled AUR-183) into two tickets and ships the data-collection half. AUR-183 was cancelled 2026-04-15 because the rule-based stitching had ~10–15% FP without user-labeled pairs to tune against. AUR-115's PROGRESS.md plan was a stricter rule with the same gating gap. AUR-276 closes that gap by silently building the candidate-pair dataset that AUR-277 (the UI half, gated on a measured FP rate) will be validated against. Scope here is telemetry only — no API field, no React surface. - Schema V3: workspace_root + git_branch on sessions; new session_link_candidates table (candidates, not links — they're unverified). - New src/correlate/: RecentWritesIndex (sliding 30-min window, sweep, hard cap with eviction) + branch-cache (60s TTL around git rev-parse, reuses runGit + gitCommonDir). - New wrapSinkWithLinks layered after wrapSinkWithStore at all three sink composition sites (TUI, serve, daemon). Errors warn-once + swallow, mirroring the store wrapper. - Adapters: Claude + OpenClaw stamp details.cwd on file_write events. - agentwatch link-candidates [--session id] [--limit n]: JSON dump for manual classification toward the AUR-277 validation gate. - AGENTWATCH_DEBUG_LINKS=1: surfaces the candidate-pair count in the TUI Header. Off by default, no overhead in normal use. - Tests: 25 new unit + integration cases (branch-cache, RecentWritesIndex, V3 migration + new store methods, linker wrapper + regression test that wrapSinkWithStore behaviour stays unchanged). PROGRESS.md captures the why-split rationale and the carry-forward map from the original AUR-115 plan.
Three material issues from codex review of #30, all fixed: 1. [P1] wrapSinkWithLinks ran processWrite BEFORE inner.emit. Since inner is wrapSinkWithStore in production, store.insert (which fires the AFTER-INSERT trigger that creates the sessions row) ran AFTER upsertSessionWorkspace's UPDATE. Result: every session's first write silently failed to populate workspace_root + git_branch, and any single-write session stayed permanently null. The whole telemetry feature would have no-op'd on first writes — exactly the data AUR-276 needs to collect. Forward to inner FIRST, then process. The pre-fix tests pre-inserted via store.insert before calling linked.emit, masking the bug. Tests now use the production sink composition (linker over store wrapper) so a single emit() exercises the real ordering invariant. 2. [P2] branch-cache keyed by gitCommonDir(cwd), which collapses linked worktrees of the same repo. Two worktrees on different branches shared a cache entry, so a write from worktree-A on `main` poisoned the cache for a write from worktree-B on `feature` for 60s — wrong branch attribution, exactly the false-positive injection AUR-276 exists to measure. Cache now keyed by cwd (per-worktree). Branch query also moved from common-dir to cwd (HEAD lives per-worktree). workspaceRoot return value is still common-dir-resolved, so on-same- branch worktrees of the same repo still collapse for matching. Added regression test. 3. [P2] gitCommonDir was called BEFORE the cache check on every event, so the 60s cache only covered the branch lookup; the common-dir shell-out fired on every cache hit, defeating the cache's hot-path purpose. Cache lookup now precedes any subprocess; common-dir is resolved only on miss and cached alongside the branch. Added regression test asserting zero shell-outs on cache hit. Tests: 404 pass (1 new), typecheck clean.
mishanefedov
added a commit
that referenced
this pull request
May 25, 2026
… only (#30) * feat(AUR-276): session-correlation telemetry — candidate-pair logging only Splits AUR-115 (which revives the cancelled AUR-183) into two tickets and ships the data-collection half. AUR-183 was cancelled 2026-04-15 because the rule-based stitching had ~10–15% FP without user-labeled pairs to tune against. AUR-115's PROGRESS.md plan was a stricter rule with the same gating gap. AUR-276 closes that gap by silently building the candidate-pair dataset that AUR-277 (the UI half, gated on a measured FP rate) will be validated against. Scope here is telemetry only — no API field, no React surface. - Schema V3: workspace_root + git_branch on sessions; new session_link_candidates table (candidates, not links — they're unverified). - New src/correlate/: RecentWritesIndex (sliding 30-min window, sweep, hard cap with eviction) + branch-cache (60s TTL around git rev-parse, reuses runGit + gitCommonDir). - New wrapSinkWithLinks layered after wrapSinkWithStore at all three sink composition sites (TUI, serve, daemon). Errors warn-once + swallow, mirroring the store wrapper. - Adapters: Claude + OpenClaw stamp details.cwd on file_write events. - agentwatch link-candidates [--session id] [--limit n]: JSON dump for manual classification toward the AUR-277 validation gate. - AGENTWATCH_DEBUG_LINKS=1: surfaces the candidate-pair count in the TUI Header. Off by default, no overhead in normal use. - Tests: 25 new unit + integration cases (branch-cache, RecentWritesIndex, V3 migration + new store methods, linker wrapper + regression test that wrapSinkWithStore behaviour stays unchanged). PROGRESS.md captures the why-split rationale and the carry-forward map from the original AUR-115 plan. * fix(AUR-276): codex review — order, worktree key, hot-path subprocess Three material issues from codex review of #30, all fixed: 1. [P1] wrapSinkWithLinks ran processWrite BEFORE inner.emit. Since inner is wrapSinkWithStore in production, store.insert (which fires the AFTER-INSERT trigger that creates the sessions row) ran AFTER upsertSessionWorkspace's UPDATE. Result: every session's first write silently failed to populate workspace_root + git_branch, and any single-write session stayed permanently null. The whole telemetry feature would have no-op'd on first writes — exactly the data AUR-276 needs to collect. Forward to inner FIRST, then process. The pre-fix tests pre-inserted via store.insert before calling linked.emit, masking the bug. Tests now use the production sink composition (linker over store wrapper) so a single emit() exercises the real ordering invariant. 2. [P2] branch-cache keyed by gitCommonDir(cwd), which collapses linked worktrees of the same repo. Two worktrees on different branches shared a cache entry, so a write from worktree-A on `main` poisoned the cache for a write from worktree-B on `feature` for 60s — wrong branch attribution, exactly the false-positive injection AUR-276 exists to measure. Cache now keyed by cwd (per-worktree). Branch query also moved from common-dir to cwd (HEAD lives per-worktree). workspaceRoot return value is still common-dir-resolved, so on-same- branch worktrees of the same repo still collapse for matching. Added regression test. 3. [P2] gitCommonDir was called BEFORE the cache check on every event, so the 60s cache only covered the branch lookup; the common-dir shell-out fired on every cache hit, defeating the cache's hot-path purpose. Cache lookup now precedes any subprocess; common-dir is resolved only on miss and cached alongside the branch. Added regression test asserting zero shell-outs on cache hit. Tests: 404 pass (1 new), typecheck clean.
mishanefedov
added a commit
that referenced
this pull request
May 25, 2026
… only (#30) * feat(AUR-276): session-correlation telemetry — candidate-pair logging only Splits AUR-115 (which revives the cancelled AUR-183) into two tickets and ships the data-collection half. AUR-183 was cancelled 2026-04-15 because the rule-based stitching had ~10–15% FP without user-labeled pairs to tune against. AUR-115's PROGRESS.md plan was a stricter rule with the same gating gap. AUR-276 closes that gap by silently building the candidate-pair dataset that AUR-277 (the UI half, gated on a measured FP rate) will be validated against. Scope here is telemetry only — no API field, no React surface. - Schema V3: workspace_root + git_branch on sessions; new session_link_candidates table (candidates, not links — they're unverified). - New src/correlate/: RecentWritesIndex (sliding 30-min window, sweep, hard cap with eviction) + branch-cache (60s TTL around git rev-parse, reuses runGit + gitCommonDir). - New wrapSinkWithLinks layered after wrapSinkWithStore at all three sink composition sites (TUI, serve, daemon). Errors warn-once + swallow, mirroring the store wrapper. - Adapters: Claude + OpenClaw stamp details.cwd on file_write events. - agentwatch link-candidates [--session id] [--limit n]: JSON dump for manual classification toward the AUR-277 validation gate. - AGENTWATCH_DEBUG_LINKS=1: surfaces the candidate-pair count in the TUI Header. Off by default, no overhead in normal use. - Tests: 25 new unit + integration cases (branch-cache, RecentWritesIndex, V3 migration + new store methods, linker wrapper + regression test that wrapSinkWithStore behaviour stays unchanged). PROGRESS.md captures the why-split rationale and the carry-forward map from the original AUR-115 plan. * fix(AUR-276): codex review — order, worktree key, hot-path subprocess Three material issues from codex review of #30, all fixed: 1. [P1] wrapSinkWithLinks ran processWrite BEFORE inner.emit. Since inner is wrapSinkWithStore in production, store.insert (which fires the AFTER-INSERT trigger that creates the sessions row) ran AFTER upsertSessionWorkspace's UPDATE. Result: every session's first write silently failed to populate workspace_root + git_branch, and any single-write session stayed permanently null. The whole telemetry feature would have no-op'd on first writes — exactly the data AUR-276 needs to collect. Forward to inner FIRST, then process. The pre-fix tests pre-inserted via store.insert before calling linked.emit, masking the bug. Tests now use the production sink composition (linker over store wrapper) so a single emit() exercises the real ordering invariant. 2. [P2] branch-cache keyed by gitCommonDir(cwd), which collapses linked worktrees of the same repo. Two worktrees on different branches shared a cache entry, so a write from worktree-A on `main` poisoned the cache for a write from worktree-B on `feature` for 60s — wrong branch attribution, exactly the false-positive injection AUR-276 exists to measure. Cache now keyed by cwd (per-worktree). Branch query also moved from common-dir to cwd (HEAD lives per-worktree). workspaceRoot return value is still common-dir-resolved, so on-same- branch worktrees of the same repo still collapse for matching. Added regression test. 3. [P2] gitCommonDir was called BEFORE the cache check on every event, so the 60s cache only covered the branch lookup; the common-dir shell-out fired on every cache hit, defeating the cache's hot-path purpose. Cache lookup now precedes any subprocess; common-dir is resolved only on miss and cached alongside the branch. Added regression test asserting zero shell-outs on cache hit. Tests: 404 pass (1 new), typecheck clean.
mishanefedov
added a commit
that referenced
this pull request
May 25, 2026
… only (#30) * feat(AUR-276): session-correlation telemetry — candidate-pair logging only Splits AUR-115 (which revives the cancelled AUR-183) into two tickets and ships the data-collection half. AUR-183 was cancelled 2026-04-15 because the rule-based stitching had ~10–15% FP without user-labeled pairs to tune against. AUR-115's PROGRESS.md plan was a stricter rule with the same gating gap. AUR-276 closes that gap by silently building the candidate-pair dataset that AUR-277 (the UI half, gated on a measured FP rate) will be validated against. Scope here is telemetry only — no API field, no React surface. - Schema V3: workspace_root + git_branch on sessions; new session_link_candidates table (candidates, not links — they're unverified). - New src/correlate/: RecentWritesIndex (sliding 30-min window, sweep, hard cap with eviction) + branch-cache (60s TTL around git rev-parse, reuses runGit + gitCommonDir). - New wrapSinkWithLinks layered after wrapSinkWithStore at all three sink composition sites (TUI, serve, daemon). Errors warn-once + swallow, mirroring the store wrapper. - Adapters: Claude + OpenClaw stamp details.cwd on file_write events. - agentwatch link-candidates [--session id] [--limit n]: JSON dump for manual classification toward the AUR-277 validation gate. - AGENTWATCH_DEBUG_LINKS=1: surfaces the candidate-pair count in the TUI Header. Off by default, no overhead in normal use. - Tests: 25 new unit + integration cases (branch-cache, RecentWritesIndex, V3 migration + new store methods, linker wrapper + regression test that wrapSinkWithStore behaviour stays unchanged). PROGRESS.md captures the why-split rationale and the carry-forward map from the original AUR-115 plan. * fix(AUR-276): codex review — order, worktree key, hot-path subprocess Three material issues from codex review of #30, all fixed: 1. [P1] wrapSinkWithLinks ran processWrite BEFORE inner.emit. Since inner is wrapSinkWithStore in production, store.insert (which fires the AFTER-INSERT trigger that creates the sessions row) ran AFTER upsertSessionWorkspace's UPDATE. Result: every session's first write silently failed to populate workspace_root + git_branch, and any single-write session stayed permanently null. The whole telemetry feature would have no-op'd on first writes — exactly the data AUR-276 needs to collect. Forward to inner FIRST, then process. The pre-fix tests pre-inserted via store.insert before calling linked.emit, masking the bug. Tests now use the production sink composition (linker over store wrapper) so a single emit() exercises the real ordering invariant. 2. [P2] branch-cache keyed by gitCommonDir(cwd), which collapses linked worktrees of the same repo. Two worktrees on different branches shared a cache entry, so a write from worktree-A on `main` poisoned the cache for a write from worktree-B on `feature` for 60s — wrong branch attribution, exactly the false-positive injection AUR-276 exists to measure. Cache now keyed by cwd (per-worktree). Branch query also moved from common-dir to cwd (HEAD lives per-worktree). workspaceRoot return value is still common-dir-resolved, so on-same- branch worktrees of the same repo still collapse for matching. Added regression test. 3. [P2] gitCommonDir was called BEFORE the cache check on every event, so the 60s cache only covered the branch lookup; the common-dir shell-out fired on every cache hit, defeating the cache's hot-path purpose. Cache lookup now precedes any subprocess; common-dir is resolved only on miss and cached alongside the branch. Added regression test asserting zero shell-outs on cache hit. Tests: 404 pass (1 new), typecheck clean.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements AUR-276 — the v1.0 telemetry half of the session-correlation work split (the v1.1 UI half is AUR-277, gated on a manual FP-rate validation against data this PR generates).
The original AUR-115 plan would have shipped a stitched-sessions UI based on a rule-based correlator with no measurement of its false-positive rate. AUR-183 was cancelled 2026-04-15 for that exact reason: without user-labeled pairs to tune against, a ~10–15% FP rate is too high to trust in a flagship feature. This PR closes that gap by silently building the candidate-pair dataset Michael needs to (a) measure FP rate, (b) decide whether AUR-277 ships as-is, with a confirmation hedge, or after a redesign.
Strictly scope-limited. No API field, no React surface, no `/api/sessions/:id` change, no `web/` change. The dataset is queryable via a CLI dump and a dev-only TUI counter; nothing else.
What ships
What does NOT ship (deferred to AUR-277, gated)
Validation gate before AUR-277
Per AUR-183's cancellation rationale, AUR-277 is blocked on accumulating ≥10 candidate pairs in real self-use, manually classifying each, and either:
This PR is the data-collection prerequisite.
Test plan
Rationale doc
`PROGRESS.md` (committed in this PR) captures the why-split decision and the carry-forward map from the original AUR-115 plan into AUR-276.
Linear