feat(cowork): observe Claude Cowork tool calls via per-session settings injection#283
Conversation
…gs injection Adds Cowork observation to the managed-observe daemon, reusing the existing Claude Code pipeline (classify -> store -> managedstream -> ledger). No backend ingest changes needed; Cowork rows are tagged agent:cowork. - internal/agent/cowork: registers the "cowork" agent, reusing the Claude hook decoder (Cowork runs the same bundled Claude Code CLI). - internal/coworkobserve: injector writes settings.json with a PreToolUse command hook into each per-session .claude dir (the host-mounted guest $HOME/.claude, which survives the per-boot VM rootfs rebuild); the hook spools each event to a host file; the collector replays events into the daemon's localruntime socket as agent "cowork". File-drop transport, no in-VM network. - Gated by KONTEXT_COWORK_OBSERVE; wired into RunDaemon; agent blank-imported. Verified end to end locally: real Cowork bash + Chrome MCP tool calls land in the hosted ledger tagged agent:cowork. Follow-ups (dev-grade today): FSEvents instead of 250ms poll; persist collector offsets (in-memory now); enforce mode (hook<->daemon decision round-trip); unit tests for injector/collector. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This stack of pull requests is managed by Graphite. Learn more about stacking. |
… lines The collector previously read the whole spool, advanced its offset to EOF, then parsed lines — so a trailing partial line (hook mid-append) was consumed and dropped forever, and a failed socket replay also lost its event because the offset had already moved. Drain now seeks to the saved offset, reads incrementally (no full-file re-read every tick), only advances past complete newline-terminated lines, halts and retries on transient socket errors (at-least-once delivery), skips permanently-malformed lines, and resets when the spool shrinks (file recreated). Covered by new collector tests against a fake daemon socket. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…e ledger rows Offsets were in-memory only, so every daemon restart (update, reboot, idle cycle) re-replayed every existing spool from byte zero. SaveDecision inserts fresh action IDs per event — two per PreToolUse — so each restart double-ingested the full history of every live session into the ledger. The collector now loads/saves its offset map at a state file next to the guard DB (cowork-spool-offsets.json, written via temp+rename after ticks that changed it), mirroring the stream-state.json convention. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…atomically The injector overwrote any settings.json that lacked its marker, which would destroy settings Cowork or the user place in the per-session dir (and re-clobber them within one poll tick whenever the in-VM CLI rewrote the file). It also wrote with a plain truncate-then-write, so the CLI could race a read of a half-written file at exactly session startup. inject now parses the existing file, appends our PreToolUse matcher group alongside whatever is already there, and writes via temp file + rename. Unparseable existing content is still replaced — the CLI could not have read it either. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The hook wrote the spool to ../ relative to whatever cwd it inherited, which silently misses the host mount if Cowork ever starts the CLI somewhere other than one level below the session dir. The session dir is the guest $HOME (its .claude subdir is where the CLI loads the injected settings from), so address the spool absolutely. Also drop the hook timeout from 12s to 5s: the append is local-disk fast, and the timeout bounds how much latency a hung host mount can add to every Cowork tool call. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The collector hand-rolled a parallel decode struct that missed the toolUseId/toolUseID camelCase fallbacks and dropped permission_mode. Replay now goes through hookruntime.DecodeClaudeEvent (the same decoder the registered cowork agent adapter wraps) plus localruntime.EvaluateRequestFromEvent, so there is one decode path and no field drift between the hook path and the spool path. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Observation depends on undocumented Cowork internals (session dir layout, host mount, settings tier), so a Cowork update could break it with no error anywhere — "no activity" and "observation broken" looked identical. Track sessions seen vs sessions carrying the hook vs sessions producing a spool, plus events replayed and malformed lines dropped, and log a 5-minute heartbeat. Warn explicitly when a session never received the hook (injection raced CLI startup or the daemon started late) and when hooks are injected but no spool ever appears (layout/mount changed). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ndow Spool files hold raw, unredacted tool inputs (the normal pipeline only persists redacted parameters) and were never rotated or deleted, so they accumulated plaintext on customer disks for as long as the session dir lived — and inflated what any offset-state loss could re-replay. The collector now removes a spool once it is fully drained and idle for an hour, drops its offset entry, and also prunes offset entries whose session dir Cowork has deleted. A session that wakes up again simply recreates the spool and drain starts over via the shrink reset. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…field The only switch was the KONTEXT_COWORK_OBSERVE env var, which is awkward to plumb through launchd plists on MDM-deployed installs. managed.json now carries an optional cowork_observe boolean (default false) and the daemon honors either it or the env var, which stays as a dev override. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…oyment caveats Spell out in the package doc that spool events are self-reported in-VM telemetry (forgeable, observe-only, never enforcement), that delivery is at-least-once, that the mechanism rides on undocumented Cowork internals watched by the health heartbeat, and that the daemon must run in the session user's context so injected files are not root-owned. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…cowork_enabled managed.json already carries the deployment-level mode knob but validation pinned it to "observe". Allow "enforce" so the same single knob the rest of the runtime is built around (guardhookruntime.Mode, per-edge result transforms) can switch managed installs to blocking. cowork_observe becomes cowork_enabled: it gates whether the Cowork loops run at all, while the posture now follows mode. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
RunDaemon hardcoded ModeObserve; it now parses managed.json's mode and passes it to runtimehost (whose existing per-edge result transform already returns real denies when not observing) and to the Cowork observer, which will pick its injected hook variant by the same mode. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…d-trip In enforce mode the injected hook becomes synchronous: it appends the event wrapped in an envelope carrying a hook-generated request id, then polls the host-mounted kontext-cowork-decisions/<rid>.json for up to 10s inside claudemanaged's conventional 20s hook timeout. The collector evaluates the event through the daemon socket (whose per-edge transform already returns real denies when the managed mode is enforce), renders the verdict with the standard Claude encoder, and parks it for the hook to emit verbatim — the CLI honors the permissionDecision. No decision in time means the hook emits deny "Kontext daemon unavailable" itself: fail-closed, mirroring the sidecar's enforce behavior. Request ids originate inside the VM, so they are validated against a strict charset before naming any file the daemon writes. mergeSettings now replaces stale variants of our entry on mode switches, orphaned decision files are TTL-cleaned, the scan tightens to 100ms under enforce, and the heartbeat reports denies. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Replace the observe-only caveat with the mode-driven behavior: enforce gates agent-via-CLI actions through the decision round-trip, fail-closed on daemon unavailability, with the honest limits spelled out (injection race window, CLI-timeout kill reads as allow, in-VM bypass out of hook reach). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Greptile SummaryThis PR adds Claude Cowork observation and enforcement to the managed observe daemon. The main changes are:
Confidence Score: 5/5This looks safe to merge.
Important Files Changed
Sequence DiagramsequenceDiagram
participant CoworkCLI as Cowork Claude CLI
participant Settings as Injected settings.json hook
participant Spool as Session spool file
participant Collector as coworkobserve collector
participant Daemon as localruntime socket
participant Decision as Decision file
CoworkCLI->>Settings: PreToolUse event
Settings->>Spool: Append event or envelope
Collector->>Spool: Read complete JSONL lines
Collector->>Daemon: Evaluate as agent cowork
Daemon-->>Collector: Allow or deny result
alt enforce mode
Collector->>Decision: Write rendered Claude decision
Settings->>Decision: Poll and emit verdict
end
Reviews (1): Last reviewed commit: "docs(coworkobserve): document mode seman..." | Re-trigger Greptile |
hasandemirkiran
left a comment
There was a problem hiding this comment.
Looks good overall, left a few comments also:
Since enforce mode depends on shell behavior ($HOME, RID generation, append, polling, stdout shape, fail-closed timeout), I’d add a small integration-style test that runs the generated observe/enforce command under sh with a temp HOME and verifies append, decision emission, and fail-closed behavior.
Minor nit: comments still mention cowork_observe even though the field is now cowork_enabled:
internal/coworkobserve/coworkobserve.go:98-100internal/managedobserve/daemon.go:103-106
| func inject(opts Options, h *health) { | ||
| claudeDirs, _ := filepath.Glob(filepath.Join(opts.SessionsRoot, "*", "*", "local_*", ".claude")) | ||
| cutoff := time.Now().Add(-3 * time.Minute) | ||
| entry := hookEntry(opts.Mode) | ||
| for _, dir := range claudeDirs { | ||
| info, err := os.Stat(dir) | ||
| if err != nil || info.ModTime().Before(cutoff) { | ||
| continue |
There was a problem hiding this comment.
inject skips .claude dirs older than 3 minutes before recording them in health state or checking whether the installed hook matches the current mode. If the daemon starts late, or if config changes from observe to enforce while a Cowork session is already running, that session can remain unhooked or keep the observe hook without the heartbeat surfacing it. I’d either track stale discovered sessions before the cutoff and warn on them, or re-check/re-merge existing settings when mode changes
There was a problem hiding this comment.
Went with your second suggestion (re-apply settings when the mode changes) in 14267a1.
The thing that made it click: the watch/block setting is only read once, when the daemon starts — it can't change while the daemon is running. So both of your cases are really the same moment: the daemon coming up while sessions are already alive (either it restarted, or someone changed the setting and restarted it to apply the change). By then the session folder hasn't changed in a while, so the normal every-few-seconds check always skips it and can never catch up to those sessions on its own.
Fix: when the daemon starts, it now does one pass over every recent session and re-applies the correct hook (watch or block), ignoring the "looks old, skip it" rule. After that, the normal check only has to handle brand-new sessions, which always look fresh. It also records each session before installing the hook, so if installing ever fails, the health log surfaces it instead of staying silent.
| func (c *collector) cleanup(opts Options, spool string) { | ||
| info, err := os.Stat(spool) | ||
| if err != nil { | ||
| return | ||
| } | ||
| if time.Since(info.ModTime()) < spoolRetention { | ||
| return | ||
| } | ||
| if c.offsets[spool] != info.Size() { | ||
| return // not fully drained yet | ||
| } | ||
| if err := os.Remove(spool); err != nil { | ||
| opts.Diagnostic.Printf("cowork observe: remove drained spool %s: %v\n", spool, err) | ||
| return | ||
| } | ||
| delete(c.offsets, spool) | ||
| c.dirty = true |
There was a problem hiding this comment.
The spool cleanup can race with the hook append path. The hook appends to $HOME/kontext-cowork-events.jsonl, while cleanup stats the file and then removes it if it looks drained and old. If Cowork appends between the stat and remove, that new event can be written to a file that gets unlinked. I’d avoid deleting spools for still-live session dirs, or add a stronger coordination mechanism such as locking/per-event files/re-stat-and-backoff
There was a problem hiding this comment.
Agreed, real TOCTOU. Fixed in 411792a: cleanup re-stats the spool (size + modtime) immediately before os.Remove and bails if either changed since the drained/idle check — so an append landing in that window is left for the next tick to drain instead of being unlinked.
Kept it a narrow guard rather than a redesign because the window only opens after a full hour of spool idleness, and under enforce the call still fails closed. TestCleanupSkipsSpoolAppendedInRemoveWindow drives the race deterministically via a test-only seam.
…guest $HOME The guest $HOME is not the per-session dir: Cowork points the bundled CLI at the per-session .claude via a config-dir override, not via $HOME, so a $HOME-relative spool lands on the ephemeral VM filesystem and never reaches the host collector. The hook's cwd is the session's outputs/ mount, so write the spool and poll the decision file at ../ (the session dir, where .claude lives and where the collector globs) instead. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The managed.json field was named cowork_enabled, but two comments still named it cowork_observe (which now only exists as the KONTEXT_COWORK_OBSERVE env-var override, deliberately a different name). No behavior change. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…artbeat The injector keyed liveness on the .claude dir modtime, but its own settings.json write freezes that modtime, so a session looked stale ~3 minutes after the first injection even while it kept running. A mode switch (observe -> enforce) never re-reached such a session — it kept the stale hook — and because the cutoff skipped the dir before recording it, the heartbeat's "never received the hook" warning could not surface it either. Add a spool-modtime fallback: a session is also live if its spool was written recently, the signal the in-VM CLI keeps fresh while driving tool calls. That re-reaches running sessions for a mode-switch re-merge and records them in sessionsSeen so the heartbeat can warn. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
cleanup stat'd a spool, confirmed it drained and idle, then removed it. A hook appending in the window between the check and the remove would write an event into a file about to be unlinked, losing it (the append opens the spool fresh, so it does not keep the inode alive). The window is narrow — it needs an append after a full hour of spool idleness — but the guard is cheap: re- stat right before the remove and skip it if the size or modtime changed. A nil-in-production test seam drives the window deterministically. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Run the actual observe/enforce command strings under /bin/sh with cwd set to the session's outputs/ mount, covering the behavior the Go-level tests can't: ../-relative spool append and exit 0 (observe), empty-stdin immediate deny, the enforce decision round-trip (discovering the shell-generated rid from the spooled envelope, then parking the decision the hook polls for), and the fail-closed deny when no decision arrives (loop count substituted 100->3 to keep the test fast, with a guard asserting the real constant is unchanged). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
Thanks for the review @hasandemirkiran — addressed both points from the top-level comment:
|
…pool modtime Replaces the per-tick spool-modtime liveness heuristic with the reviewer's suggested approach: re-check/re-merge existing settings when the mode changes. The mode is read once from managed.json at daemon start, so it cannot change without a restart — which means "mode switched observe->enforce while a session runs" and "daemon started after a session" are the same event: the daemon coming up while sessions already exist. So a single forced pass at startup (reinjectExisting) re-merges the configured-mode hook into every recent session, regardless of the frozen .claude dir modtime, before they next act. This closes the gap the spool heuristic left — an idle-but-alive session whose first call after a switch still ran under the old hook — and is simpler: steady-state inject only has to catch newly-created sessions. Bounded to sessions touched within the last day so the pass does not write into abandoned session dirs. inject/reinjectExisting share mergeInto, which records the session as seen before writing so a failed hook still surfaces in the heartbeat. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ks work Claude Code's settings watcher only watches dirs that already had a settings file when the session started. Cowork does not pre-create one, so writing the first settings.json into a session whose CLI is already running never takes effect — yet health marked it hooked. That is the silent miss relocated into the health log. Split the signal: - written — wrote (or found current) our hook AND have reason to believe it loads: a new session we seeded before its CLI started, or a re-merge of a hook already present (that dir was watched at session start, so the change hot-reloads). - unverified — wrote a first-time hook onto a possibly-already-running session (the startup pass). The watcher may never load it, so it is best-effort and must not read as working. The heartbeat warns about these explicitly. - confirmed — a spool arrived: ground truth the hook fires. A spool promotes a session from unverified to written. reinjectExisting now trusts only sessions that already carried our hook (the safe mode-switch re-merge); a first-time hook on a pre-existing session is written best-effort but recorded as unverified. Steady-state inject still trusts its writes (it wins the race before the CLI starts). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Summary
Adds Claude Cowork observation and enforcement to the managed-observe daemon, reusing the existing Claude Code pipeline end to end (classify → store → stream → ledger). Cowork activity is recorded with
agent: "cowork", so it appears alongside Claude Code with no downstream or schema changes. The posture follows the deployment-levelmanaged.jsonmode— observe records would-decisions, enforce returns real deny/allow verdicts to the in-VM CLI.Background
Claude Cowork runs the bundled Claude Code CLI inside a per-session VM whose root filesystem is rebuilt on each boot, so hooks can't be persisted in the image and the host daemon's unix socket is unreachable from the guest. What does survive — and cross the boundary — is the per-session CLAUDE config dir, which Cowork mounts from the host. This PR uses that mount as both transport legs: settings injection inbound, an event spool plus decision files outbound/return.
What this adds
internal/agent/cowork— registers thecoworkagent, reusing the Claude decoder/encoder (Cowork ships the same CLI, same hook formats).internal/coworkobserve— two host-side loops started by the daemon:PreToolUsecommandhook into each new per-sessionsettings.json(preserving existing settings/hooks, atomic write, stale variants replaced on mode switches).cowork. At-least-once delivery: offsets only advance past complete, successfully replayed lines and persist across restarts (cowork-spool-offsets.json), so restarts never duplicate ledger rows.managed.jsonmode: "enforce", now accepted by config validation and plumbed throughRunDaemon→runtimehost, so it drives every hook edge consistently):kontext-cowork-decisions/<rid>.jsonfor up to 10s inside the conventional 20s hook timeout;permissionDecision;"Kontext daemon unavailable"itself: fail-closed, mirroring the sidecar's enforce behavior;cowork_enabledinmanaged.json(off by default);KONTEXT_COWORK_OBSERVEstays as a dev override.Not changed
No changes to the ledger batch builder, the ingest contract, the store schema, or the stream. Cowork events flow through the existing path, tagged
agent: cowork.Caveats (also in the package doc)
Verified
Observe path locally end to end: real Cowork tool calls (bash + Chrome MCP tools) recorded and streamed, tagged
agent: cowork. Enforce path covered by collector round-trip tests against a daemon-socket fake; needs one manual end-to-end pass in a real Cowork session before enabling for any customer (deny rendering in the Cowork UI, decision latency under real VM mount).gofmt/go vet/go build/go test -race ./...clean.Follow-ups