Skip to content

feat(cowork): observe Claude Cowork tool calls via per-session settings injection#283

Open
tumberger wants to merge 21 commits into
mainfrom
06-12-feat_cowork_observe_claude_cowork_tool_calls_via_per-session_settings_injection
Open

feat(cowork): observe Claude Cowork tool calls via per-session settings injection#283
tumberger wants to merge 21 commits into
mainfrom
06-12-feat_cowork_observe_claude_cowork_tool_calls_via_per-session_settings_injection

Conversation

@tumberger

@tumberger tumberger commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds Claude Cowork observation and enforcement to the managed-observe daemon, reusing the existing Claude Code pipeline end to end (classify → store → stream → ledger). Cowork activity is recorded with agent: "cowork", so it appears alongside Claude Code with no downstream or schema changes. The posture follows the deployment-level managed.json mode — observe records would-decisions, enforce returns real deny/allow verdicts to the in-VM CLI.

Background

Claude Cowork runs the bundled Claude Code CLI inside a per-session VM whose root filesystem is rebuilt on each boot, so hooks can't be persisted in the image and the host daemon's unix socket is unreachable from the guest. What does survive — and cross the boundary — is the per-session CLAUDE config dir, which Cowork mounts from the host. This PR uses that mount as both transport legs: settings injection inbound, an event spool plus decision files outbound/return.

What this adds

  • internal/agent/cowork — registers the cowork agent, reusing the Claude decoder/encoder (Cowork ships the same CLI, same hook formats).
  • internal/coworkobserve — two host-side loops started by the daemon:
    • injector: merges a PreToolUse command hook into each new per-session settings.json (preserving existing settings/hooks, atomic write, stale variants replaced on mode switches).
    • collector: tails the spool files and replays each event into the daemon's existing localruntime socket as agent cowork. At-least-once delivery: offsets only advance past complete, successfully replayed lines and persist across restarts (cowork-spool-offsets.json), so restarts never duplicate ledger rows.
  • Observe mode: the hook is fire-and-forget (append, exit 0); decisions are recorded as would-allow/deny, nothing blocks.
  • Enforce mode (managed.json mode: "enforce", now accepted by config validation and plumbed through RunDaemonruntimehost, so it drives every hook edge consistently):
    • the hook wraps each event in an envelope with a hook-generated request id, appends it to the spool, and polls kontext-cowork-decisions/<rid>.json for up to 10s inside the conventional 20s hook timeout;
    • the collector evaluates through the same socket (whose per-edge transform returns the real verdict when not observing), renders it with the standard Claude encoder, and writes the decision file atomically — the CLI honors the permissionDecision;
    • no verdict in time → the hook emits deny "Kontext daemon unavailable" itself: fail-closed, mirroring the sidecar's enforce behavior;
    • request ids originate in the VM and are validated against a strict charset before naming any file the daemon writes (path-traversal guard); orphaned decision files are TTL-cleaned; the scan tightens to 100ms under enforce.
  • Health heartbeat: sessions seen vs hooked vs spooling, replay/deny/drop counters, logged every 5 minutes with explicit warnings — so a Cowork-internals change that breaks observation (or enforcement) is visible instead of silent.
  • Drained spools (raw tool inputs) are deleted after an hour idle. Gated by cowork_enabled in managed.json (off by default); KONTEXT_COWORK_OBSERVE stays as a dev override.
  • Unit tests: injector merge/mode-switch, collector partial lines, transport-failure retry, malformed lines, restart persistence, spool cleanup, camelCase decoding, enforce deny round-trip, unsafe-rid rejection.

Not changed

No changes to the ledger batch builder, the ingest contract, the store schema, or the stream. Cowork events flow through the existing path, tagged agent: cowork.

Caveats (also in the package doc)

  • Spool events are self-reported in-VM telemetry — forgeable from inside the VM; not attested records.
  • Enforcement gates agent-via-CLI actions only. A tool call landing before settings injection runs unguarded (heartbeat surfaces such sessions; FSEvents follow-up shrinks the window). A hook killed at the CLI's own timeout reads as allow, so fail-closed is best-effort within the timeout budget.
  • Delivery is at-least-once; a replay retried after a partial send can very occasionally duplicate a ledger event (never silently drop one).
  • The mechanism rides on undocumented Cowork internals (session dir layout, host mount, settings tier); the health heartbeat exists to make breakage visible.
  • Run the daemon in the session user's context (LaunchAgent, not root LaunchDaemon) so injected files aren't root-owned.

Verified

Observe path locally end to end: real Cowork tool calls (bash + Chrome MCP tools) recorded and streamed, tagged agent: cowork. Enforce path covered by collector round-trip tests against a daemon-socket fake; needs one manual end-to-end pass in a real Cowork session before enabling for any customer (deny rendering in the Cowork UI, decision latency under real VM mount). gofmt/go vet/go build/go test -race ./... clean.

Follow-ups

  • Injector uses a short poll loop; switch to FSEvents to shrink the injection window (which under enforce is a bypass window).
  • Optional: a static linux/arm64 helper binary dropped into the session mount to replace the shell handshake (fsync, tighter deadlines) — depends on the mount allowing exec.

…gs injection

Adds Cowork observation to the managed-observe daemon, reusing the existing
Claude Code pipeline (classify -> store -> managedstream -> ledger). No backend
ingest changes needed; Cowork rows are tagged agent:cowork.

- internal/agent/cowork: registers the "cowork" agent, reusing the Claude hook
  decoder (Cowork runs the same bundled Claude Code CLI).
- internal/coworkobserve: injector writes settings.json with a PreToolUse
  command hook into each per-session .claude dir (the host-mounted guest
  $HOME/.claude, which survives the per-boot VM rootfs rebuild); the hook spools
  each event to a host file; the collector replays events into the daemon's
  localruntime socket as agent "cowork". File-drop transport, no in-VM network.
- Gated by KONTEXT_COWORK_OBSERVE; wired into RunDaemon; agent blank-imported.

Verified end to end locally: real Cowork bash + Chrome MCP tool calls land in
the hosted ledger tagged agent:cowork.

Follow-ups (dev-grade today): FSEvents instead of 250ms poll; persist collector
offsets (in-memory now); enforce mode (hook<->daemon decision round-trip); unit
tests for injector/collector.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Copy link
Copy Markdown
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

tumberger and others added 13 commits June 12, 2026 15:53
… lines

The collector previously read the whole spool, advanced its offset to EOF,
then parsed lines — so a trailing partial line (hook mid-append) was
consumed and dropped forever, and a failed socket replay also lost its
event because the offset had already moved.

Drain now seeks to the saved offset, reads incrementally (no full-file
re-read every tick), only advances past complete newline-terminated lines,
halts and retries on transient socket errors (at-least-once delivery),
skips permanently-malformed lines, and resets when the spool shrinks
(file recreated). Covered by new collector tests against a fake daemon
socket.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…e ledger rows

Offsets were in-memory only, so every daemon restart (update, reboot,
idle cycle) re-replayed every existing spool from byte zero. SaveDecision
inserts fresh action IDs per event — two per PreToolUse — so each restart
double-ingested the full history of every live session into the ledger.

The collector now loads/saves its offset map at a state file next to the
guard DB (cowork-spool-offsets.json, written via temp+rename after ticks
that changed it), mirroring the stream-state.json convention.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…atomically

The injector overwrote any settings.json that lacked its marker, which
would destroy settings Cowork or the user place in the per-session dir
(and re-clobber them within one poll tick whenever the in-VM CLI rewrote
the file). It also wrote with a plain truncate-then-write, so the CLI
could race a read of a half-written file at exactly session startup.

inject now parses the existing file, appends our PreToolUse matcher group
alongside whatever is already there, and writes via temp file + rename.
Unparseable existing content is still replaced — the CLI could not have
read it either.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The hook wrote the spool to ../ relative to whatever cwd it inherited,
which silently misses the host mount if Cowork ever starts the CLI
somewhere other than one level below the session dir. The session dir is
the guest $HOME (its .claude subdir is where the CLI loads the injected
settings from), so address the spool absolutely.

Also drop the hook timeout from 12s to 5s: the append is local-disk fast,
and the timeout bounds how much latency a hung host mount can add to
every Cowork tool call.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The collector hand-rolled a parallel decode struct that missed the
toolUseId/toolUseID camelCase fallbacks and dropped permission_mode.
Replay now goes through hookruntime.DecodeClaudeEvent (the same decoder
the registered cowork agent adapter wraps) plus
localruntime.EvaluateRequestFromEvent, so there is one decode path and
no field drift between the hook path and the spool path.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Observation depends on undocumented Cowork internals (session dir layout,
host mount, settings tier), so a Cowork update could break it with no
error anywhere — "no activity" and "observation broken" looked identical.

Track sessions seen vs sessions carrying the hook vs sessions producing a
spool, plus events replayed and malformed lines dropped, and log a
5-minute heartbeat. Warn explicitly when a session never received the
hook (injection raced CLI startup or the daemon started late) and when
hooks are injected but no spool ever appears (layout/mount changed).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ndow

Spool files hold raw, unredacted tool inputs (the normal pipeline only
persists redacted parameters) and were never rotated or deleted, so they
accumulated plaintext on customer disks for as long as the session dir
lived — and inflated what any offset-state loss could re-replay.

The collector now removes a spool once it is fully drained and idle for
an hour, drops its offset entry, and also prunes offset entries whose
session dir Cowork has deleted. A session that wakes up again simply
recreates the spool and drain starts over via the shrink reset.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…field

The only switch was the KONTEXT_COWORK_OBSERVE env var, which is awkward
to plumb through launchd plists on MDM-deployed installs. managed.json
now carries an optional cowork_observe boolean (default false) and the
daemon honors either it or the env var, which stays as a dev override.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…oyment caveats

Spell out in the package doc that spool events are self-reported in-VM
telemetry (forgeable, observe-only, never enforcement), that delivery is
at-least-once, that the mechanism rides on undocumented Cowork internals
watched by the health heartbeat, and that the daemon must run in the
session user's context so injected files are not root-owned.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…cowork_enabled

managed.json already carries the deployment-level mode knob but
validation pinned it to "observe". Allow "enforce" so the same single
knob the rest of the runtime is built around (guardhookruntime.Mode,
per-edge result transforms) can switch managed installs to blocking.

cowork_observe becomes cowork_enabled: it gates whether the Cowork loops
run at all, while the posture now follows mode.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
RunDaemon hardcoded ModeObserve; it now parses managed.json's mode and
passes it to runtimehost (whose existing per-edge result transform
already returns real denies when not observing) and to the Cowork
observer, which will pick its injected hook variant by the same mode.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…d-trip

In enforce mode the injected hook becomes synchronous: it appends the
event wrapped in an envelope carrying a hook-generated request id, then
polls the host-mounted kontext-cowork-decisions/<rid>.json for up to 10s
inside claudemanaged's conventional 20s hook timeout. The collector
evaluates the event through the daemon socket (whose per-edge transform
already returns real denies when the managed mode is enforce), renders
the verdict with the standard Claude encoder, and parks it for the hook
to emit verbatim — the CLI honors the permissionDecision.

No decision in time means the hook emits deny "Kontext daemon
unavailable" itself: fail-closed, mirroring the sidecar's enforce
behavior. Request ids originate inside the VM, so they are validated
against a strict charset before naming any file the daemon writes.
mergeSettings now replaces stale variants of our entry on mode switches,
orphaned decision files are TTL-cleaned, the scan tightens to 100ms under
enforce, and the heartbeat reports denies.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Replace the observe-only caveat with the mode-driven behavior: enforce
gates agent-via-CLI actions through the decision round-trip, fail-closed
on daemon unavailability, with the honest limits spelled out (injection
race window, CLI-timeout kill reads as allow, in-VM bypass out of hook
reach).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@tumberger tumberger marked this pull request as ready for review June 12, 2026 17:29
@greptile-apps

greptile-apps Bot commented Jun 12, 2026

Copy link
Copy Markdown

Greptile Summary

This PR adds Claude Cowork observation and enforcement to the managed observe daemon. The main changes are:

  • Registers a new cowork agent adapter using the Claude hook format.
  • Adds Cowork settings injection, event spooling, replay, decision files, cleanup, and health logging.
  • Allows managed config mode to be either observe or enforce.
  • Starts the Cowork observer from the managed daemon when enabled by config or environment.
  • Adds unit tests for settings merge, spool replay, offset persistence, cleanup, and enforce decisions.

Confidence Score: 5/5

This looks safe to merge.

  • No blocking issues found in the changed code.

Important Files Changed

Filename Overview
internal/coworkobserve/coworkobserve.go Adds the Cowork injector, collector, replay, decision, cleanup, and heartbeat loops.
internal/managedobserve/daemon.go Plumbs managed mode into runtimehost and starts Cowork observation when enabled.
internal/managedconfig/config.go Adds enforce mode validation and the cowork_enabled managed config field.

Sequence Diagram

sequenceDiagram
  participant CoworkCLI as Cowork Claude CLI
  participant Settings as Injected settings.json hook
  participant Spool as Session spool file
  participant Collector as coworkobserve collector
  participant Daemon as localruntime socket
  participant Decision as Decision file

  CoworkCLI->>Settings: PreToolUse event
  Settings->>Spool: Append event or envelope
  Collector->>Spool: Read complete JSONL lines
  Collector->>Daemon: Evaluate as agent cowork
  Daemon-->>Collector: Allow or deny result
  alt enforce mode
    Collector->>Decision: Write rendered Claude decision
    Settings->>Decision: Poll and emit verdict
  end
Loading

Reviews (1): Last reviewed commit: "docs(coworkobserve): document mode seman..." | Re-trigger Greptile

@hasandemirkiran hasandemirkiran left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, left a few comments also:

Since enforce mode depends on shell behavior ($HOME, RID generation, append, polling, stdout shape, fail-closed timeout), I’d add a small integration-style test that runs the generated observe/enforce command under sh with a temp HOME and verifies append, decision emission, and fail-closed behavior.

Minor nit: comments still mention cowork_observe even though the field is now cowork_enabled:

  • internal/coworkobserve/coworkobserve.go:98-100
  • internal/managedobserve/daemon.go:103-106

Comment on lines +353 to +360
func inject(opts Options, h *health) {
claudeDirs, _ := filepath.Glob(filepath.Join(opts.SessionsRoot, "*", "*", "local_*", ".claude"))
cutoff := time.Now().Add(-3 * time.Minute)
entry := hookEntry(opts.Mode)
for _, dir := range claudeDirs {
info, err := os.Stat(dir)
if err != nil || info.ModTime().Before(cutoff) {
continue

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inject skips .claude dirs older than 3 minutes before recording them in health state or checking whether the installed hook matches the current mode. If the daemon starts late, or if config changes from observe to enforce while a Cowork session is already running, that session can remain unhooked or keep the observe hook without the heartbeat surfacing it. I’d either track stale discovered sessions before the cutoff and warn on them, or re-check/re-merge existing settings when mode changes

@tumberger tumberger Jun 13, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went with your second suggestion (re-apply settings when the mode changes) in 14267a1.

The thing that made it click: the watch/block setting is only read once, when the daemon starts — it can't change while the daemon is running. So both of your cases are really the same moment: the daemon coming up while sessions are already alive (either it restarted, or someone changed the setting and restarted it to apply the change). By then the session folder hasn't changed in a while, so the normal every-few-seconds check always skips it and can never catch up to those sessions on its own.

Fix: when the daemon starts, it now does one pass over every recent session and re-applies the correct hook (watch or block), ignoring the "looks old, skip it" rule. After that, the normal check only has to handle brand-new sessions, which always look fresh. It also records each session before installing the hook, so if installing ever fails, the health log surfaces it instead of staying silent.

Comment on lines +464 to +480
func (c *collector) cleanup(opts Options, spool string) {
info, err := os.Stat(spool)
if err != nil {
return
}
if time.Since(info.ModTime()) < spoolRetention {
return
}
if c.offsets[spool] != info.Size() {
return // not fully drained yet
}
if err := os.Remove(spool); err != nil {
opts.Diagnostic.Printf("cowork observe: remove drained spool %s: %v\n", spool, err)
return
}
delete(c.offsets, spool)
c.dirty = true

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spool cleanup can race with the hook append path. The hook appends to $HOME/kontext-cowork-events.jsonl, while cleanup stats the file and then removes it if it looks drained and old. If Cowork appends between the stat and remove, that new event can be written to a file that gets unlinked. I’d avoid deleting spools for still-live session dirs, or add a stronger coordination mechanism such as locking/per-event files/re-stat-and-backoff

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, real TOCTOU. Fixed in 411792a: cleanup re-stats the spool (size + modtime) immediately before os.Remove and bails if either changed since the drained/idle check — so an append landing in that window is left for the next tick to drain instead of being unlinked.

Kept it a narrow guard rather than a redesign because the window only opens after a full hour of spool idleness, and under enforce the call still fails closed. TestCleanupSkipsSpoolAppendedInRemoveWindow drives the race deterministically via a test-only seam.

tumberger and others added 5 commits June 13, 2026 14:37
…guest $HOME

The guest $HOME is not the per-session dir: Cowork points the bundled CLI at
the per-session .claude via a config-dir override, not via $HOME, so a
$HOME-relative spool lands on the ephemeral VM filesystem and never reaches
the host collector. The hook's cwd is the session's outputs/ mount, so write
the spool and poll the decision file at ../ (the session dir, where .claude
lives and where the collector globs) instead.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The managed.json field was named cowork_enabled, but two comments still
named it cowork_observe (which now only exists as the KONTEXT_COWORK_OBSERVE
env-var override, deliberately a different name). No behavior change.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…artbeat

The injector keyed liveness on the .claude dir modtime, but its own
settings.json write freezes that modtime, so a session looked stale ~3
minutes after the first injection even while it kept running. A mode switch
(observe -> enforce) never re-reached such a session — it kept the stale
hook — and because the cutoff skipped the dir before recording it, the
heartbeat's "never received the hook" warning could not surface it either.

Add a spool-modtime fallback: a session is also live if its spool was
written recently, the signal the in-VM CLI keeps fresh while driving tool
calls. That re-reaches running sessions for a mode-switch re-merge and
records them in sessionsSeen so the heartbeat can warn.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
cleanup stat'd a spool, confirmed it drained and idle, then removed it. A
hook appending in the window between the check and the remove would write an
event into a file about to be unlinked, losing it (the append opens the spool
fresh, so it does not keep the inode alive). The window is narrow — it needs
an append after a full hour of spool idleness — but the guard is cheap: re-
stat right before the remove and skip it if the size or modtime changed.

A nil-in-production test seam drives the window deterministically.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Run the actual observe/enforce command strings under /bin/sh with cwd set to
the session's outputs/ mount, covering the behavior the Go-level tests can't:
../-relative spool append and exit 0 (observe), empty-stdin immediate deny,
the enforce decision round-trip (discovering the shell-generated rid from the
spooled envelope, then parking the decision the hook polls for), and the
fail-closed deny when no decision arrives (loop count substituted 100->3 to
keep the test fast, with a guard asserting the real constant is unchanged).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@tumberger

tumberger commented Jun 13, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for the review @hasandemirkiran — addressed both points from the top-level comment:

  • Shell integration test (39d47f5): added tests that run the actual observe/enforce command strings under /bin/sh with cwd in outputs/. They cover the spool append + exit 0, empty-stdin deny, the enforce decision round-trip (discovering the shell-generated rid from the spooled envelope, then parking the decision the hook polls for), and fail-closed when no decision arrives (loop count substituted 100→3 to keep it fast, with a guard asserting the real constant is unchanged).
  • cowork_observecowork_enabled doc nit (bd9a93c): both comments corrected; only the KONTEXT_COWORK_OBSERVE env var keeps that spelling (deliberately a different name).

tumberger and others added 2 commits June 13, 2026 15:26
…pool modtime

Replaces the per-tick spool-modtime liveness heuristic with the reviewer's
suggested approach: re-check/re-merge existing settings when the mode changes.

The mode is read once from managed.json at daemon start, so it cannot change
without a restart — which means "mode switched observe->enforce while a
session runs" and "daemon started after a session" are the same event: the
daemon coming up while sessions already exist. So a single forced pass at
startup (reinjectExisting) re-merges the configured-mode hook into every
recent session, regardless of the frozen .claude dir modtime, before they
next act. This closes the gap the spool heuristic left — an idle-but-alive
session whose first call after a switch still ran under the old hook — and is
simpler: steady-state inject only has to catch newly-created sessions.

Bounded to sessions touched within the last day so the pass does not write
into abandoned session dirs. inject/reinjectExisting share mergeInto, which
records the session as seen before writing so a failed hook still surfaces in
the heartbeat.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ks work

Claude Code's settings watcher only watches dirs that already had a settings
file when the session started. Cowork does not pre-create one, so writing the
first settings.json into a session whose CLI is already running never takes
effect — yet health marked it hooked. That is the silent miss relocated into
the health log.

Split the signal:

- written  — wrote (or found current) our hook AND have reason to believe it
  loads: a new session we seeded before its CLI started, or a re-merge of a
  hook already present (that dir was watched at session start, so the change
  hot-reloads).
- unverified — wrote a first-time hook onto a possibly-already-running session
  (the startup pass). The watcher may never load it, so it is best-effort and
  must not read as working. The heartbeat warns about these explicitly.
- confirmed — a spool arrived: ground truth the hook fires. A spool promotes a
  session from unverified to written.

reinjectExisting now trusts only sessions that already carried our hook (the
safe mode-switch re-merge); a first-time hook on a pre-existing session is
written best-effort but recorded as unverified. Steady-state inject still
trusts its writes (it wins the race before the CLI starts).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants