feat(AUR-218): custom Docker sandbox image for agentwatch-daily#29
Closed
mishanefedov wants to merge 120 commits into
Closed
feat(AUR-218): custom Docker sandbox image for agentwatch-daily#29mishanefedov wants to merge 120 commits into
mishanefedov wants to merge 120 commits into
Conversation
Four related fixes from real dogfooding:
1. CLAUDE WAS SILENTLY BROKEN. The chokidar v4 watch pattern
`${dir}/**/*.jsonl` never fired because v4 dropped glob support
(we already fixed this in openclaw.ts and cursor.ts, forgot
claude). Replaced with recursive watch + regex filter. Smoke test
on dev machine: claude adapter now surfaces thousands of events
across 9+ projects where before it was 0.
2. PROJECT PREFIX on every event. Claude session paths encode the
project (`~/.claude/projects/-Users-foo-IdeaProjects-auraqu/...`)
so we extract the last segment as a `[auraqu]` prefix on every
summary. OpenClaw tracks cwd per session from `session_start`
events and tags subsequent events in that session. Cursor uses
path heuristics. Huge visibility win — you can finally see WHICH
project each agent is working on.
3. CURSOR STARTUP NOISE REMOVED. Before: Cursor emitted 3 'detected
at startup' events into the timeline on every launch, which
looked like agent activity even when you hadn't touched Cursor
in months. Now: the startup snapshot populates a CursorStatus
object returned alongside the stop fn; only real file-change
events (mcp edit, permission edit, .cursorrules edit, new
recently-viewed file) hit the timeline.
4. TIMELINE TRUNCATION via Ink's wrap='truncate' so long summaries
don't break rows into multi-line walls of text.
13 tests still passing. Typecheck clean.
Previously events rendered in arrival order — backfill from session files could arrive out-of-order so an event from 02:00 would appear above a live event from 09:24. Now each incoming event is binary-inserted at its correct position based on the event's ts field so the timeline stays strictly reverse-chronological.
Two UX fixes from real dogfood: 1. Add a bold/dim header row (TIME / AGENT / TYPE / EVENT) at the top of the timeline so it's obvious what each column is. 2. Enter the terminal's alt screen buffer (\\x1b[?1049h) on startup and leave it (\\x1b[?1049l) on exit. Standard TUI behaviour — lazygit/k9s/htop do this. While agentwatch runs it takes over the viewport; on quit the shell's scrollback is restored as though nothing happened. Registered handlers for exit, SIGINT, SIGTERM, SIGHUP so Ctrl-C and kill signals still restore the terminal cleanly.
chokidar's watcher.close() awaits pending fs handles which made q feel laggy (2-3s) on exit. We don't care — OS reaps fds. Call exit() for the Ink unmount then schedule process.exit(0) on the next tick.
Previously many Claude events rendered as '[auraqu] tool_call' / '[auraqu] response' with no payload because extractFields only looked at top-level o.input, but the real tool_use data lives in o.message.content[i].input. Now we walk message.content properly: - Bash tool_use → 'Bash: <command>' + cmd field populated + classified as shell_exec (risk 9 for destructive cmds) - Read/Write/Edit/MultiEdit → '<Tool>: <path>' + path field + correct event type (file_read / file_write) - Grep/Glob → '<Tool>: <pattern>' - Task → 'Task: <description>' - WebFetch → 'WebFetch: <url>' - Fallback for unknown tools: '<name>: <first string arg>' Also suppresses noise: - Assistant messages with empty content (compaction stubs) - User turns that are tool_result blocks only (no human text) - worktree-state / compact / summary entries Smoke test on dev data: 0 empty-looking events across 2147 events. Every entry now has readable content or is filtered out. Updated one test (Bash now classifies as shell_exec) + added a regression test for empty-message suppression. 14 tests passing.
0.0.1 was published but unusable (claude adapter silently returned 0 events, EMFILE crash on workspace scan, empty timeline rows). 0.0.2 bundles all fixes discovered during same-day dogfood. Contents documented in CHANGELOG.
Press 'p' now shows three sections: - Claude Code: as before (allow/deny/defaultMode + risk flags) - Cursor: approval mode, sandbox state, allow/deny counts, MCP server list, discovered .cursorrules paths. Data was already collected via CursorStatus — just wasn't rendered. - OpenClaw: default workspace + per-sub-agent breakdown (name, emoji, id, model, workspace). OpenClaw has no allow/deny — scope is controlled by the workspace path, so we render that instead. Gemini CLI exposes only auth (no permission model), so we document that fact in the footer rather than show an empty section.
Claude Code stores subagent runs at: ~/.claude/projects/<proj>/<session>/subagents/agent-<id>.jsonl Previously invisible — our path regex required the jsonl to be directly in the project folder and chokidar depth was 3. Every subagent's inner tool calls (often dozens: Bash, WebFetch, Grep…) never made it to the timeline; only the parent 'Agent: <task>' entry showed up. Now: - regex matches both main-session and subagents/ files - depth bumped 3 → 5 - subAgentId extracted from filename, threaded into the translator - events prefixed with [project / sub:<id8>] so you can see which subagent each call came from Smoke test on dev data: 2217 main-session events + 8513 subagent events now surfaced (previously 0 from subagents). Closes a huge observability gap that was invisible until today. All 14 tests still passing.
AUR-99. First M4 ticket. claude-devtools parity win. - Added EventDetails type to schema (fullText, thinking, toolInput, toolUseId) so translators stash the real payload alongside the truncated summary - Claude adapter populates details for prompts, responses, and tool_use events. Extracts thinking blocks separately. - OpenClaw adapter same - New EventDetail.tsx renders the full content: wraps long text to terminal width, shows JSON-pretty tool input, highlights extended thinking, scrollable with up/down / j/k - Reducer gains selectedIdx + detailOpen + detailScroll state - ↑↓ or j/k moves selection in the timeline (inverse-highlighted row) - Enter / l opens detail; esc / q closes - If selection drifts off-screen, Timeline centers the window on it - When new events arrive above a selected row, the cursor shifts to stay on the same event 14 tests still passing. Biggest UX upgrade since the initial scaffold — every event now has real inspectable content.
AUR-101. Second M4 ticket. - Press / in the TUI to open a search input; every keystroke narrows the timeline to events whose summary, path, cmd, tool, agent, fullText, or thinking contain the query (case-insensitive substring). Backspace to edit, esc to clear. - Match count shown below the timeline while searching. - Query is a sticky filter on top of the existing agent filter and the buffer sort order. Cross-session disk search will come later (AUR-111, M6). This one covers the in-memory buffer which is instant up to ~10k events.
AUR-118. Third M4 ticket.
- New src/util/cost.ts with per-model pricing tables (opus-4-6,
sonnet-4-6, haiku-4-5) + default fallback. Parses the usage block
(input_tokens / cache_creation_input_tokens / cache_read_input_tokens /
output_tokens) and returns USD cost.
- Cache accounting is critical — cache_read is ~10% of base rate, so
naive summers are 3-10x wrong on Claude. Get this right.
- Claude adapter extracts model + usage from each assistant turn and
stashes them in event.details.{usage, cost, model}.
- Agent side panel shows per-agent cost total (yellow).
- Event detail pane shows token breakdown + cost for every
assistant-turn event.
Smoke test on dev data: 10,735 events, 9,707 with cost, total
$861.21 across accumulated backfill. Real pricing math.
…flag
AUR-123. Fourth M4 ticket. Unblocks AUR-100 subagent drilldown and
AUR-122 inline expansion.
- New EventSink type in schema: adapters receive { emit, enrich }
instead of just an emit callback. emit is unchanged; enrich
patches an already-emitted event's details in place.
- App reducer: new 'enrich' action walks the buffer and merges the
patch onto the matching event (O(n) scan, fine for <500 events).
- Claude adapter:
- pendingToolUses map tracks every tool_use's eventId + ts
- handleToolResults scans user turns for tool_result blocks,
pairs by tool_use_id, enriches with:
* toolResult (stdout / file body / search matches)
* durationMs (ts delta)
* toolError (is_error flag)
- orphanResults cache handles backfill out-of-order (bounded to
1000 entries, drops oldest on overflow)
- tool_use emitters check the orphan cache and enrich immediately
if the result arrived first
- Detail pane now shows duration + tool result content + error flag
alongside tokens/cost/model
- Openclaw / cursor / fs-watcher: accept either an Emit fn or an
EventSink — backward compatible
Smoke test on dev data: 10,729 Claude events surfaced, 6,005 got a
durationMs, 5,919 got full toolResult content, 241 errors
flagged. Real pairing across backfill + live.
AUR-100. Fifth M4 ticket. Uses the tool_use/tool_result pairing
from AUR-123.
When a Claude Agent tool_use's tool_result arrives, the result text
contains the spawned subagent's agentId (e.g. 'agentId:
ab3c99fca44a218cb'). We regex-extract it in handleToolResults and
stash on details.subAgentId.
UI:
- Timeline rows with a known subAgentId show a yellow suffix:
[auraqu] Agent: Multi-agent dev pain research ▸ 52 child events
where the child count aggregates all events whose sessionId ==
'agent-<subAgentId>' (the subagent jsonl filename).
- New hotkey 'x' on a focused Agent event scopes the timeline to
that subagent only — shows every Bash, WebFetch, Grep it made.
- 'X' (shift-x) unscopes back to full timeline.
- Scope banner above the footer makes the filter visible.
- Works with existing search / agent-filter / pause stack.
Smoke test on dev data: 15 parent Agent events tagged with their
subAgentId — matches the count of Task spawns today. Claude-devtools
parity for multi-agent drilldown.
… list AUR-122. Sixth M4 ticket. Matches claude-devtools' signature UX. Events now show a ▸ marker in the summary row when they have expandable content (toolResult, toolInput, fullText, thinking). Hotkeys: - right-arrow or 'o' on focused row: toggle inline expansion - left-arrow: collapse (if expanded) - Enter still opens the full-screen detail pane for long content Expanded content rendered indented under the parent row: - tool_input (JSON-pretty) when there's no tool_result yet - tool_result (stdout / file body / matches), red if error - fullText for prompts/responses when no tool context - capped at 10 lines with '… N more lines (press Enter for full view)' so a long Read doesn't blow up the timeline Summary rows also got duration + ERR flag inline, matching claude-devtools' compressed tool row format: ▸ Bash: git log · 15ms ▸ Edit: src/app.tsx · 7ms · ERR Timeline component extended with expandedIds prop; state.expandedIds lives in App.tsx as a Set<string>.
AUR-119. Seventh M4 ticket. First of the navigation hierarchy that matches claude-devtools' left sidebar. - New src/util/project-index.ts builds a ProjectRow[] from the event buffer, aggregating: event count, per-agent count, session ids, cost, last activity. Sorted by last ts descending. - New src/ui/ProjectsView.tsx renders the list with selected row highlight, per-project agent counts (claude:484, openclaw:34, …), cost, last-active (5m ago / 2d ago). - Hotkey P opens the view; ↑↓ / j/k navigates; Enter picks a project and applies it as a filter to the main timeline. - Hotkey A clears the project filter (back to all). - Scope banner above the footer makes the active filter visible alongside subagent scope + search query. Smoke test on dev data: 23 projects surfaced across all Claude sessions (auraqu / collector / landing / research / …). Real cross-project aggregation even before AUR-120 sessions view.
…ll-in Per user feedback: inline ▸/▾ expansion clutters the timeline and isn't as useful as the full-screen modal. Reverting that piece of AUR-122 while keeping the good parts (duration + ERR inline in the summary row, child count marker). - Removed expandedIds state + toggle-expand action + right/left arrow hotkeys - Removed ExpansionBlock + hasExpandableContent + buildExpansionLines - Timeline rows no longer carry a ▸/▾ prefix - Enter opens the full-screen detail pane, esc closes — as before Net effect: cleaner timeline, single drill-in UX.
Two dogfood bugs reported by user. 1. q wasn't quitting. Root cause: search input's Enter handler did nothing, leaving the user stuck in search capture mode where every keystroke (including q) went into the query buffer. Added a confirm-search action: Enter now exits input mode while keeping the query as a sticky filter. Outside input mode, q always quits. 2. Permissions screen overflowed the terminal with no way to see clipped content. Rewrote PermissionView to build a flat row array (h1 / h2 / kv / item / text / blank) and render only a slice based on scroll offset. New state.permissionsScroll moves with ↑↓ / j/k. Header shows '1-20 of 47' while scrolled. esc or p closes. q still quits. All 14 tests green.
AUR-120 + AUR-121. Two hierarchy levels landed together since they share state. Flow: P opens projects → Enter picks a project → sessions list for that project appears, grouped by date (Today / Yesterday / Last 7 days / Older) → Enter on a session scopes the main timeline to only that session's events. - src/util/project-index.ts: new buildSessionRows + dateBucket + SessionRow type. First user prompt per session is cached. - src/ui/SessionsView.tsx: bucketed list with scroll window, selection highlight, per-row agent tag (claude-code / openclaw:content / cursor), event count + relative time + cost + ERR flag. - App.tsx: new reducer actions for open/close-sessions, sessions-move, sessions-scroll, sessions-open-selected. New sessionFilter state applies to the timeline filter chain. - Footer hint reflects the currently-active view. - Banner shows 'session <short-id> (A to clear)' when scoped. Keyboard: ↑↓ / j/k navigates; enter opens; esc back to projects list; q still quits from anywhere.
AUR-125. The live TUI was showing 'unknown file_change' events that duplicated Claude's own file_write events from the jsonl. Every Claude Edit / Write / MultiEdit was counted twice — once with full diff + project tag, once as bare unattributed noise. - New src/util/recent-writes.ts module with a cross-adapter cache: markAgentWrite(path, ts) + wasRecentlyWrittenByAgent(path). - 5s dedupe window, 30s TTL. Entries auto-sweep when the module is touched (no timers, no background work). - Claude adapter calls markAgentWrite after emitting any file_write or file_read with a path. - fs-watcher skips emission when the path matches a recent agent write. Keeps fs-watcher as a safety net for truly manual edits (Cursor with its broken activity log, terminal edits, non-instrumented agents) while silencing the Claude double-count.
AUR-124. Parity with claude-devtools' per-tool copy button.
- New src/util/clipboard.ts — zero-dep, platform-native: pbcopy on
macOS; wl-copy/xclip/xsel on Linux (first match wins); clip on
Windows. Returns structured {ok,reason} so the caller can
surface 'install xclip' instead of crashing.
- eventToYankText picks the most useful payload: tool result >
fullText > cmd > path > summary.
- App.tsx: 'y' on a focused event yanks; dispatches a transient
flash message (green '✓ copied N chars' or red '✗ reason') that
auto-clears after 2s.
- Footer hint updated: 'y yank' added.
Verified on dev machine: pbcopy works end-to-end.
AUR-102. Final M4 ticket — M4 complete. Fires desktop notifications for: - .env read/write from any agent - ~/.ssh, ~/.aws, ~/.gnupg paths touched - rm -rf / sudo / curl | sh in shell_exec - tool_result is_error Rate-limited: one alert per rule-key per 60s (keyed on path or cmd prefix) so a looping agent doesn't spam notifications. Platform dispatch: - macOS: osascript 'display notification' (no deps) - Linux: notify-send - Windows: PowerShell MessageBox fallback Only fires for events whose ts is AFTER TUI launch time — backfill from historical sessions is silent. Notifier self- disables on the first platform error so a missing notify-send doesn't spam stderr.
Ink's raw-mode stdin breaks default stdio inheritance on spawnSync, surfacing as 'EBADF' when pbcopy (or notify-send / osascript) is invoked from inside the running TUI. Fix: explicit stdio for every spawnSync — pipe stdin where we're supplying input, ignore child stdout/stderr to stay out of the TUI's renderer. Clipboard now works from inside the TUI instead of only from scripts.
AUR-129. Ship before publish so hotkeys are discoverable. Press ? from anywhere to open a grouped keybindings reference: Navigate / Filter & scope / Actions / Info views / Detail pane / Help. Press ? or esc to close. Footer hint now leads with [?] help so first-time users notice immediately. Also trimmed the footer hint down by removing redundant hotkeys already covered in the help screen.
…DUCT + templates + UX polish
Launch-grade README (250+ lines):
- Centered hero with badges (npm, CI, license, Node)
- Table of contents
- Why / Install / First-60-seconds walkthroughs
- Per-feature sections with real examples: timeline, detail pane,
subagent drilldown, project/session nav, search, permissions,
cost-with-cache-accounting, notifications, clipboard yank
- Full keyboard reference as tables
- What agentwatch reads (paths table)
- Configuration (env vars)
- How it compares (claude-devtools / Unfucked / Langfuse / Phoenix)
- Limitations (honest: Cursor config-only, Gemini + Codex not
instrumented, macOS+Linux only, 40-row timeline window)
- Non-goals (hard scope boundaries)
- Roadmap (v0.4 / v0.5 / v1.0 highlights)
- Architecture diagram with layer mental model
- Development / Security / License
Revived + written:
- CONTRIBUTING.md — dev workflow, PR checklist, what's in-scope
- SECURITY.md — responsible disclosure, scope, full path list,
'what it does NOT do' invariants
- CODE_OF_CONDUCT.md — Contributor Covenant v2.1 pointer
- .github/ISSUE_TEMPLATE/{bug_report,adapter_request,feature_request}.md
- .github/PULL_REQUEST_TEMPLATE.md
UX polish (AUR-126):
- Breadcrumb component surfaces active view + every active filter/scope
(project, session, sub-agent, agent, search)
- '0' = home — reset all filters/scopes, close modals
- 'Z' = clear filters (replaces confusing 'A' case-variant)
- 'esc' = go back one level consistently (sessions→projects→timeline)
- Removed the per-banner scope hints (breadcrumb covers them)
- Footer hint tightened + reordered
Added LAUNCH_POSTS.md to .gitignore (local-only draft doc).
All 14 tests green. tsup build clean.
Drafts live in the private Linear ticket / chat, not the repo.
AUR-76. Third agent live (after Claude + OpenClaw). v0 surfaces 517 events across 4 projects on the dev machine. - Watches ~/.gemini/tmp recursively (depth 4) for session JSON files matching chats/session-*.json - Each session is a single JSON document (not JSONL); re-read + diff-against-emitted-ids on every change - Translates messages with type 'user'→prompt, 'gemini'→response, 'error'→response (risk 6), skips 'info' - Honors session kind='subagent' in the project prefix + tool tag - Project extracted from tmp/<project>/chats/ path segment, with fallback for unusual layouts Gemini's text doesn't contain structured tool_use like Claude's — the model narrates its actions in prose. We surface them as response events verbatim rather than applying brittle regex.
… status Added detection for 5 more agents: Codex, Aider, Cline (VS Code extension), Continue.dev, Windsurf (via .codeium), Goose. - AgentName union expanded in schema.ts - detect.ts grew per-agent detection paths with OS awareness (Cline has different locations on macOS vs Linux) - New 'instrumented' boolean on DetectedAgent marks whether we actually parse events or just recognize the install - AgentPanel renders a yellow dot + 'detected (events TBD)' label for detected-but-not-instrumented agents - doctor output suggests opening an issue with a redacted session file so we can ship the adapter Discovered Windsurf is installed on the dev machine (~/.codeium). Flagged as not-yet-instrumented so users see honest status rather than silent 'works with X' marketing claims. Design choice: rather than stub 3 unverified parsers (Aider, Codex, Cline, Windsurf) from documented specs without local data to test against, we detect + tell the truth + ask for sample sessions. Shipping a broken adapter is worse than not shipping one.
chore: executable bin + dynamic version read
docs(directives): stop backlog growth; ambiguity is not a blocker
Adds the persist-everywhere rule to §11: every repo touched in a run must be committed and pushed before session-end, including partial work that ends in a [BLOCKED] exit. A [BLOCKED] ping with a dirty working tree is now itself defined as a failure — the agent must clean the tree (commit+push, or stash) before pinging Telegram. Also: include the KB commit SHA in the Telegram summary when the run wrote to the knowledge base. Motivation: today the agentwatch-daily cron sent "[BLOCKED] dirty main" because a previous run had left the directives file uncommitted. The fix is to make leaving a dirty tree a session-end failure, not a session-start abort.
docs(directives): require commit+push before [BLOCKED] exit
The readline-based read loop in claude-code, codex, and openclaw advanced the cursor by Buffer.byteLength(line) + 1 for every emitted line — including the trailing line of a chunk that had not yet been newline-terminated by the producing agent. JSON.parse failed on the partial line, the catch block swallowed it, and the rest of the line, when later flushed, was read as a fresh line and also failed. Result: silent permanent loss of otherwise-valid events. Replaced with a small synchronous helper readNewlineTerminatedLines that returns only complete (\n-terminated) lines and a consumed count that points at the last \n. Adapters now advance their cursor by exactly that count, so any unterminated tail stays unread until the next pass. Added jsonl-stream.test.ts plus claude-code.test.ts cases covering the split-chunk write scenario.
…-241) Run 981bbbf1-e4bc-442b-9d87-39b65e169039 burned 17 minutes on 2026-04-21 before the cron killed it. Per-run logs only show the \"timeout\" terminator — no command-level trace, so the offending command is unrecoverable. Most likely culprits: a wedged \`openclaw status\`, a hung \`gh\` call (auth flow / paginated GraphQL), or an unbounded \`curl\` to Linear/Telegram. Mitigation, since the cron backstop is too late: 1. New section in .agentwatch-bot/prompt.md — \"Timeouts on hang-prone commands\" — defines a portable \`wt <secs> <cmd>\` helper using perl's \`alarm\` (no coreutils dependency, works on stock macOS + Linux). Returns 124 on timeout to match GNU \`timeout\`. 2. Table of hang-prone commands and suggested timeouts: openclaw status (30s), gh (60s), curl (30s), git fetch (60s), git push (120s), npm test (300s). 3. STEP 0's \`openclaw status\` invocation already updated to use the helper inline. 4. New AGENT_DIRECTIVES.md §7 hard red line: do not run hang-prone commands without an explicit timeout. This is preventive, not retroactive — we can't tell which command hung on April 21 without per-command logs. AUR-241 closes here; if the next timeout repeats, file a follow-up that adds command-level trace logging to the cron runner.
fix(directives): require timeout wrappers on hang-prone commands (AUR-241)
fix(adapters): preserve partial JSONL lines across reads (AUR-227)
The claude-code, codex, and openclaw adapters used to swallow JSON.parse failures with an empty catch block. If an agent silently changed its session-file format or a line was corrupted, operators would lose events and never know — the TUI just showed a shorter timeline. Added a per-session ParseErrorTracker that emits one synthetic parse_error event the first time a session fails to parse a newline-terminated line, then enriches the same event with the running count + a truncated sample of the latest offending line. The event shows up in the timeline at low risk (1) with summary "⚠ unparseable line — context loss possible", so the operator sees a clear marker that they're missing context. New schema EventType "parse_error" plus details.parseErrorCount / parseErrorSample. Wired into all three JSONL adapters.
…16) (#11) Provider rates change between releases of the agentwatch CLI. Until this commit the Claude / Gemini / GPT rates were hardcoded in src/util/cost.ts, so any provider price change silently produced wrong cost math for every operator until a new CLI version shipped. loadRates() now merges the baked-in DEFAULT_RATES with any entries the user wrote to ~/.agentwatch/pricing.json (overridable via the AGENTWATCH_PRICING_PATH env var). Each entry must carry all four fields (input / cacheCreate / cacheRead / output) as non-negative numbers — partial entries are dropped so we never silently mix a stale field with a fresh one. Validation rejections are logged when AGENTWATCH_PRICING_DEBUG=1. Schema documented in docs/features/cost-accounting.md, including the normalized-model-name rule and the "all-or-nothing" override semantics.
…AUR-217) (#10) The Claude, Codex, and Gemini adapters all enrich their tool_use events with the matching toolResult / exit code / duration from the next turn. The OpenClaw adapter was missing this entirely — every tool_use event landed in the timeline without stdout, error state, or runtime, making the OpenClaw view nearly useless for postmortem. Two changes here: 1. extractToolUse now matches OpenClaw's native shape (type:"toolCall", arguments:{...}) in addition to the Anthropic-style (type:"tool_use", input:{...}) it previously assumed. Real ~/.openclaw sessions use the former, so most tool_use events were silently being dropped at the translateSession stage too. Captures the toolCall id as details.toolUseId. 2. New handleOpenClawToolResult harvester recognizes message.role:"toolResult" turns, looks up the pending tool_use by toolCallId, and enriches with toolResult / toolError / durationMs. Mirrors the Claude adapter's pendingToolUses + orphanResults pattern, including bounded maps to survive crashes mid-turn. Also handles file/cmd field synonyms (file, cmd) because OpenClaw's toolCalls aren't strictly aligned with the file_path / command names.
…age.txt (AUR-242) (#12) The TRIAGE mode in AGENT_DIRECTIVES.md §5 referenced ~/.agentwatch-bot/last-triage.txt with the parenthetical "create it on first run with `now`" — too vague. On a fresh dev machine, after a manual edit, or if the file gets wiped between runs, the `gh search "created:>$LAST_TRIAGE"` query silently breaks (empty $LAST_TRIAGE → query parses but returns nothing useful, OR `cat` errors and the agent never recovers). Replaced the vague hint with an explicit defensive-init bash block that: - mkdir -p the dotdir - writes a now-minus-24h ISO timestamp if the file is missing - validates the file content matches an ISO-8601 UTC pattern, rewrites the default if not - splits the trailing Z for the gh search variant that needs it Also added a session-start checklist note in .agentwatch-bot/prompt.md pointing the bot at this block before the first gh search of any TRIAGE run.
…-214) (#13) Acceptance criteria for AUR-214 was: research whether Gemini CLI or OpenClaw persist any compaction marker; if not, document the structural limitation in the relevant feature contracts. Result of the research: neither does. Gemini chat JSON (~/.gemini/tmp/<proj>/chats/session-*.json) carries only user/gemini/error/info message types. The CLI's /compress command rewrites context in-place but writes nothing distinguishable into the file. Survey of every active session on this dev machine — no compaction-shaped record. OpenClaw session JSONL records session, message, model_change, thinking_level_change, custom, custom_message. The custom subtypes in the wild are model-snapshot, openclaw:bootstrap-context:full, openclaw:prompt-error, plus openclaw.sessions_yield as a parent→child handoff. None of these are context resets. Added docs/features/compaction-visualizer.md with the per-agent support matrix and a note on what shape a future marker would need to take to be wired through. Explicitly warned against synthesizing compaction from indirect signals (cacheRead drop, model swap) — false positives are worse than honest blanks here. No source changes; the visualizer already supports compaction events when an adapter emits them.
Cuts the version + CHANGELOG for the eight fixes that have been sitting on main since v0.0.4 (AUR-214, AUR-216, AUR-217, AUR-227, AUR-228, AUR-241, AUR-242, plus version-from-package.json + chmod on bin). No new features.
* feat(store): SQLite event store with FTS5 (AUR-263) Adds src/store/sqlite.ts as the persistent source of truth for every event the adapters emit. Replaces the 4 MB rolling backfill with an indexed, queryable, FTS-searchable store at ~/.agentwatch/events.db. Three tables — events (canonical AgentEvent), sessions (auto-aggregated via insert trigger: cost, ts range, count, project), tool_calls (tool, duration, error). FTS5 virtual table over prompt/response/thinking/ tool_result/summary with porter+unicode61 tokenization. Versioned migrations (schema_version). WAL + synchronous=NORMAL. Wires the store into both TUI and serve-mode EventSinks via a write- through wrapper (src/store/wire.ts) — failures are logged once and never propagated. Adds 'agentwatch prune --older-than-days N' CLI subcommand and a new 'history' mode on POST /api/search backed by FTS5. Out of scope (filed as follow-ups): TUI reducer becoming a thin cache over the store, full /sessions /projects route migration, schema migration tooling for end users. Bench: ingests 10k events in ~430ms on M1 air. 279/279 tests pass. * feat(daemon): background capture daemon with launchd + systemd (AUR-262) (#17) Closes the largest stated limitation — agentwatch was a viewer, not a daemon. `agentwatch daemon start | stop | status | logs` now installs a user-level service that runs the adapter pipeline 24/7 and writes every event into the SQLite store at ~/.agentwatch/events.db. src/daemon/install.ts renders the launchd plist (macOS) or systemd user unit (Linux); the unit invokes `agentwatch daemon run`, the internal foreground subcommand that: 1. Acquires a PID lock at ~/.agentwatch/daemon.pid (stale-pid aware via a process-alive probe; auto-rotates the lock if the holder died without cleaning up). 2. Opens the SQLite store and starts every adapter wired through wrapSinkWithStore so events persist on disk. 3. Writes its start time at ~/.agentwatch/daemon.started_at so `status` can compute uptime. 4. Drains cleanly on SIGTERM / SIGINT / SIGHUP — closes adapters, closes the store, releases the lock. src/daemon/log-rotate.ts is an append-only writer with a single rotation slot at 10 MB (the file rolls to .log.1; older history is out of scope on purpose). src/daemon/index.ts is the controller that dispatches start / stop / status / logs / run. `status` reports running yes/no, PID, uptime, events captured, last event ts, and DB size by querying the same SQLite store the daemon writes to. The TUI and `agentwatch serve` keep working as clients of the same store, so events captured overnight are visible the moment you open them. Stacked on PR #16 (AUR-263 SQLite store) — the daemon depends on `src/store/`. 10 new tests cover log rotation, plist + systemd unit rendering, and the process-liveness probe; full suite 289/289. * feat(classify): per-event activity classifier (AUR-264) (#18) Adds a 12-category activity classifier that answers the CodeBurn-viral question 'where is my spend going?' Categories: coding, debugging, exploration, planning, refactor, testing, docs, chat, config, review, devops, research. src/classify/activity.ts is a heuristic ladder — no ML dep. Rules combine file-extension signals (.test.ts → testing, .md → docs, package.json → config), tool-name signals (Grep → exploration, WebFetch → research, kubectl/docker/terraform → devops), shell- command signals (npm test, pytest, git diff, eslint), and prompt/ response keyword signals (refactor, error, audit, plan). Each rule contributes a weighted score; argmax wins. Empty input falls through to chat. src/classify/sink.ts wraps an EventSink so events land with details.category attached before they reach the store or the TUI reducer. Idempotent — won't overwrite an already-set category. Wired into both TUI and serve mode in front of the store wrapper: adapters → withClassifier → wrapSinkWithStore → innerSink. Schema v2 migration: ALTER TABLE events ADD COLUMN category TEXT, new index idx_events_category, store has activityBySession() and activityByProject() that GROUP BY category. New API: GET /api/sessions/:id/activity, GET /api/projects/:name/activity. 36 classifier tests (per-rule cases + sink wrapper + 75% top-1 agreement on the synthetic dataset) + 3 store activity-rollup tests. Full suite 315/315 pass. Out of scope (defer to follow-up): React activity views in the web UI, TUI EventDetail surface for category, the full 200-turn hand- labelled validation harness — synthetic dataset substitutes for v0.1 and we re-evaluate on real-data accuracy after dogfood. Stacked on PR #16 (AUR-263 SQLite store) — uses the v2 migration.
…get) (#21) * feat(store): SQLite event store with FTS5 (AUR-263) Adds src/store/sqlite.ts as the persistent source of truth for every event the adapters emit. Replaces the 4 MB rolling backfill with an indexed, queryable, FTS-searchable store at ~/.agentwatch/events.db. Three tables — events (canonical AgentEvent), sessions (auto-aggregated via insert trigger: cost, ts range, count, project), tool_calls (tool, duration, error). FTS5 virtual table over prompt/response/thinking/ tool_result/summary with porter+unicode61 tokenization. Versioned migrations (schema_version). WAL + synchronous=NORMAL. Wires the store into both TUI and serve-mode EventSinks via a write- through wrapper (src/store/wire.ts) — failures are logged once and never propagated. Adds 'agentwatch prune --older-than-days N' CLI subcommand and a new 'history' mode on POST /api/search backed by FTS5. Out of scope (filed as follow-ups): TUI reducer becoming a thin cache over the store, full /sessions /projects route migration, schema migration tooling for end users. Bench: ingests 10k events in ~430ms on M1 air. 279/279 tests pass. * feat(yield): git-correlation $/commit + $/line views (AUR-265) src/git/correlate.ts walks each project's git log via spawnSync and pairs commits with sessions whose [first_ts, last_ts + 30min] window contains the commit's author date. Read-only — git verbs are allow-listed (log / rev-parse / worktree / show / diff / blame / status / config / branch / remote); any other verb throws. Per-session yield: cost-per-commit, cost-per-line-changed, total insertions/deletions/files. Per-project yield: weekly cost-per-commit trend + a sorted 'spend without commit' list of sessions that burned dollars but produced no commits in window. Worktree de-dup via gitCommonDir() so two checkouts of the same repo share a backing repo. Project-name → git-root resolution by walking WORKSPACE_ROOT one level looking for a basename match with a .git entry. New routes: GET /api/sessions/:id/yield, GET /api/projects/:name/yield. Both return ok:false with a reason when there's no store, no project tag, or no git repo under WORKSPACE_ROOT. 14 vitest tests using real git repos via execSync (gitInit + commit helpers). Full suite 293/293 pass. Out of scope (filed as follow-up): React /sessions/:id/yield and /projects/:name/yield views in the web UI; the API ships now and the visualization is its own ticket. Stacked on PR #16 (AUR-263 SQLite store) — sessions/projects come from the store. * feat(daemon): background capture daemon with launchd + systemd (AUR-262) (#17) Closes the largest stated limitation — agentwatch was a viewer, not a daemon. `agentwatch daemon start | stop | status | logs` now installs a user-level service that runs the adapter pipeline 24/7 and writes every event into the SQLite store at ~/.agentwatch/events.db. src/daemon/install.ts renders the launchd plist (macOS) or systemd user unit (Linux); the unit invokes `agentwatch daemon run`, the internal foreground subcommand that: 1. Acquires a PID lock at ~/.agentwatch/daemon.pid (stale-pid aware via a process-alive probe; auto-rotates the lock if the holder died without cleaning up). 2. Opens the SQLite store and starts every adapter wired through wrapSinkWithStore so events persist on disk. 3. Writes its start time at ~/.agentwatch/daemon.started_at so `status` can compute uptime. 4. Drains cleanly on SIGTERM / SIGINT / SIGHUP — closes adapters, closes the store, releases the lock. src/daemon/log-rotate.ts is an append-only writer with a single rotation slot at 10 MB (the file rolls to .log.1; older history is out of scope on purpose). src/daemon/index.ts is the controller that dispatches start / stop / status / logs / run. `status` reports running yes/no, PID, uptime, events captured, last event ts, and DB size by querying the same SQLite store the daemon writes to. The TUI and `agentwatch serve` keep working as clients of the same store, so events captured overnight are visible the moment you open them. Stacked on PR #16 (AUR-263 SQLite store) — the daemon depends on `src/store/`. 10 new tests cover log rotation, plist + systemd unit rendering, and the process-liveness probe; full suite 289/289. * feat(classify): per-event activity classifier (AUR-264) (#18) Adds a 12-category activity classifier that answers the CodeBurn-viral question 'where is my spend going?' Categories: coding, debugging, exploration, planning, refactor, testing, docs, chat, config, review, devops, research. src/classify/activity.ts is a heuristic ladder — no ML dep. Rules combine file-extension signals (.test.ts → testing, .md → docs, package.json → config), tool-name signals (Grep → exploration, WebFetch → research, kubectl/docker/terraform → devops), shell- command signals (npm test, pytest, git diff, eslint), and prompt/ response keyword signals (refactor, error, audit, plan). Each rule contributes a weighted score; argmax wins. Empty input falls through to chat. src/classify/sink.ts wraps an EventSink so events land with details.category attached before they reach the store or the TUI reducer. Idempotent — won't overwrite an already-set category. Wired into both TUI and serve mode in front of the store wrapper: adapters → withClassifier → wrapSinkWithStore → innerSink. Schema v2 migration: ALTER TABLE events ADD COLUMN category TEXT, new index idx_events_category, store has activityBySession() and activityByProject() that GROUP BY category. New API: GET /api/sessions/:id/activity, GET /api/projects/:name/activity. 36 classifier tests (per-rule cases + sink wrapper + 75% top-1 agreement on the synthetic dataset) + 3 store activity-rollup tests. Full suite 315/315 pass. Out of scope (defer to follow-up): React activity views in the web UI, TUI EventDetail surface for category, the full 200-turn hand- labelled validation harness — synthetic dataset substitutes for v0.1 and we re-evaluate on real-data accuracy after dogfood. Stacked on PR #16 (AUR-263 SQLite store) — uses the v2 migration.
* feat(store): SQLite event store with FTS5 (AUR-263) Adds src/store/sqlite.ts as the persistent source of truth for every event the adapters emit. Replaces the 4 MB rolling backfill with an indexed, queryable, FTS-searchable store at ~/.agentwatch/events.db. Three tables — events (canonical AgentEvent), sessions (auto-aggregated via insert trigger: cost, ts range, count, project), tool_calls (tool, duration, error). FTS5 virtual table over prompt/response/thinking/ tool_result/summary with porter+unicode61 tokenization. Versioned migrations (schema_version). WAL + synchronous=NORMAL. Wires the store into both TUI and serve-mode EventSinks via a write- through wrapper (src/store/wire.ts) — failures are logged once and never propagated. Adds 'agentwatch prune --older-than-days N' CLI subcommand and a new 'history' mode on POST /api/search backed by FTS5. Out of scope (filed as follow-ups): TUI reducer becoming a thin cache over the store, full /sessions /projects route migration, schema migration tooling for end users. Bench: ingests 10k events in ~430ms on M1 air. 279/279 tests pass. * feat(daemon): background capture daemon with launchd + systemd (AUR-262) (#17) Closes the largest stated limitation — agentwatch was a viewer, not a daemon. `agentwatch daemon start | stop | status | logs` now installs a user-level service that runs the adapter pipeline 24/7 and writes every event into the SQLite store at ~/.agentwatch/events.db. src/daemon/install.ts renders the launchd plist (macOS) or systemd user unit (Linux); the unit invokes `agentwatch daemon run`, the internal foreground subcommand that: 1. Acquires a PID lock at ~/.agentwatch/daemon.pid (stale-pid aware via a process-alive probe; auto-rotates the lock if the holder died without cleaning up). 2. Opens the SQLite store and starts every adapter wired through wrapSinkWithStore so events persist on disk. 3. Writes its start time at ~/.agentwatch/daemon.started_at so `status` can compute uptime. 4. Drains cleanly on SIGTERM / SIGINT / SIGHUP — closes adapters, closes the store, releases the lock. src/daemon/log-rotate.ts is an append-only writer with a single rotation slot at 10 MB (the file rolls to .log.1; older history is out of scope on purpose). src/daemon/index.ts is the controller that dispatches start / stop / status / logs / run. `status` reports running yes/no, PID, uptime, events captured, last event ts, and DB size by querying the same SQLite store the daemon writes to. The TUI and `agentwatch serve` keep working as clients of the same store, so events captured overnight are visible the moment you open them. Stacked on PR #16 (AUR-263 SQLite store) — the daemon depends on `src/store/`. 10 new tests cover log rotation, plist + systemd unit rendering, and the process-liveness probe; full suite 289/289. * feat(classify): per-event activity classifier (AUR-264) (#18) Adds a 12-category activity classifier that answers the CodeBurn-viral question 'where is my spend going?' Categories: coding, debugging, exploration, planning, refactor, testing, docs, chat, config, review, devops, research. src/classify/activity.ts is a heuristic ladder — no ML dep. Rules combine file-extension signals (.test.ts → testing, .md → docs, package.json → config), tool-name signals (Grep → exploration, WebFetch → research, kubectl/docker/terraform → devops), shell- command signals (npm test, pytest, git diff, eslint), and prompt/ response keyword signals (refactor, error, audit, plan). Each rule contributes a weighted score; argmax wins. Empty input falls through to chat. src/classify/sink.ts wraps an EventSink so events land with details.category attached before they reach the store or the TUI reducer. Idempotent — won't overwrite an already-set category. Wired into both TUI and serve mode in front of the store wrapper: adapters → withClassifier → wrapSinkWithStore → innerSink. Schema v2 migration: ALTER TABLE events ADD COLUMN category TEXT, new index idx_events_category, store has activityBySession() and activityByProject() that GROUP BY category. New API: GET /api/sessions/:id/activity, GET /api/projects/:name/activity. 36 classifier tests (per-rule cases + sink wrapper + 75% top-1 agreement on the synthetic dataset) + 3 store activity-rollup tests. Full suite 315/315 pass. Out of scope (defer to follow-up): React activity views in the web UI, TUI EventDetail surface for category, the full 200-turn hand- labelled validation harness — synthetic dataset substitutes for v0.1 and we re-evaluate on real-data accuracy after dogfood. Stacked on PR #16 (AUR-263 SQLite store) — uses the v2 migration. * feat(adapters): Claude Code native hooks adapter (AUR-266) Anthropic's hooks API delivers events about 1–2 seconds faster than the JSONL transcript and never misses a sub-event. agentwatch can now register itself as a hook and have Claude POST every event into our pipeline in real time. src/adapters/claude-hooks.ts registers POST /api/hooks/:event on the fastify app. translateHook() maps the 10 known Claude event types (SessionStart, SessionEnd, UserPromptSubmit, PreToolUse, PostToolUse, Stop, SubagentStop, PreCompact, PostCompact, Notification) into our canonical AgentEvent shape. Unknown future events fall through to a generic tool_call so a Claude release adding new hook types doesn't silently drop data. src/adapters/claude-hooks-install.ts owns ~/.claude/settings.json round-trips. Stanzas are tagged with the marker comment '[agentwatch-managed]' so uninstall only touches our stanzas; any user-configured hooks are preserved. src/adapters/hooks-dedup.ts is a 5-second-window registry shared between the hooks adapter (writes) and the JSONL adapter sink wrapper (reads). withClaudeHookDedup() drops claude-code events that arrive without details.source === "hooks" when their (sessionId, toolUseId) signature was marked in the last 5s. Hook events bypass the check because they're stamped 'hooks' as the source. Wiring: adapters → withClaudeHookDedup → wrapSinkWithStore → innerSink hook route emit → withClaudeHookDedup (bypass) → ... → innerSink ServerHandle gains setHookSink(sink) so the route looks up the sink lazily — that lets the TUI start the server first and wire the sink once adapters are up. CLI: agentwatch hooks {install | uninstall | status}. Doctor reports 'claude code hooks: installed | not-installed | partial'. 19 vitest tests cover the dedup registry, the dedup sink wrapper, hook payload translation for every known event type, settings.json install/uninstall round-trips with user-hook preservation, and status reporting. Full suite 298/298 pass. docs/features/claude-hooks.md documents the install flow + dedup semantics + why both hook and JSONL paths run together. Out of scope: hook-based blocking (PreToolUse decision: block) — the v0.3 control-plane bet. Hooks for non-Claude agents (none of them ship a hooks API). Stacked on PR #16 (AUR-263 SQLite store).
Adds the minimal glama.json metadata file ($schema + maintainers) that Glama's profile completion check requires, and embeds the score + card badges in the README so the registry status is visible from the repo front page. Triggered by https://github.com/punkpeye/awesome-mcp-servers#5665 — the awesome-mcp-servers PR is gated on Glama having a quality score for the server, which can only happen once Glama is able to find and parse glama.json. Pairs with the v0.0.5 GitHub release I just cut (Glama also requires at least one tagged release to count the project as 'maintained').
…#23) Routes now read from the SQLite store if passed via StartServerOptions, falling back to the in-memory ring buffer if not. This enables the UI to query events that have fallen out of the live tail window.
#26) Budget rollups, anomaly histories, and sub-agent child counts now query the SQLite store via the new EventStore.listRecentEvents() instead of iterating the 500-event React buffer. The live tail still drives the timeline view; only the derived passes change source. App.tsx also seeds the live tail at launch from store.listRecentEvents( { limit: 500 }) so the timeline isn't empty until adapter JSONL backfill re-emits. The wrapSinkWithStore dedup guards against id collisions when adapters do re-emit. Adds 4 vitest cases covering desc/asc order, sinceTs filter, and limit clamp. Full suite 368/368 (was 364).
Visualizes the per-event activity classifier (AUR-264) data: - /sessions/:id/activity — stacked bar of events × category in 1-min buckets across the session's lifetime, plus a per-category cost + count + percentage table. Bucketed client-side from the session's events so we don't need a new time-series API. - /projects/:name/activity — pie of events-by-category + pie of cost-by-category + per-category breakdown table including the new sessionsTouched column. Empty-state for unclassified projects. Both routes are lazy-loaded behind the existing Suspense fallback, matching SessionTokens/Trends/SessionGraph. Nav links surface from Session.tsx and ProjectDetail.tsx. Extends activityByProject SQL with COUNT(DISTINCT session_id) and adds sessionsTouched to ActivityBucket. EventDetails type in the web client now exposes details.category. +1 vitest case for the new column. Full suite 369/369 (was 368).
Visualizes the git-correlation $/commit + $/line endpoints (AUR-265). - /sessions/:id/yield — commits-in-window table (hash, author, subject, files, +/-) with totals row, plus $/commit + $/line + cost + lines-changed callouts. Empty state when no commits landed during the session window. - /projects/:name/yield — weekly composed chart (cost bars + commit bars + $/commit line overlay), plus a sortable 'spend without commit' session list (cost / lines / files). Sortable column buttons. Both views handle the API's ok:false reasons (no store, no project tag, not a git repo under WORKSPACE_ROOT) with a helpful explainer. Lazy-loaded behind Suspense in main.tsx; nav links from Session.tsx and ProjectDetail.tsx. Full suite 369/369; typecheck clean (one pre-existing Logs.tsx unused var unrelated to this PR).
Adds Dockerfile with Node 22 + gh CLI over Debian bookworm-slim to support running the autonomous agent fully containerized via OpenClaw sandbox.mode. Includes runbook for human setup and updates.
Owner
Author
|
Closing — this PR only added the internal autonomous-bot sandbox (.agentwatch-bot/), which has been removed from the repo. Not part of agentwatch itself. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implements Linear issue AUR-218.
What changed and why
Added a
Dockerfileand asandbox-runbook.mdinto.agentwatch-bot/to allow OpenClaw to run the daily agent fully sandboxed while still providing Node.js 22 andghCLI. Without this custom image,npm testandgh pr createwould fail in the default OpenClaw sandbox.What you considered and rejected
Considered extending
openclaw-sandbox:bookworm-slimbut extending standardnode:22-bookworm-slimis simpler and provides the exact LTS matrix version we need.Test evidence
docker build -t agentwatch-sandbox -f .agentwatch-bot/Dockerfile .runs successfully.