feat(check-inbox): add watermark to prevent stale message re-delivery#159
feat(check-inbox): add watermark to prevent stale message re-delivery#159tayhiga-prog wants to merge 3 commits into
Conversation
…ee#44) Root cause: send.sh's 4-positional-arg interface (team, from, to, message) caused frequent argument-order mistakes by LLM agents — session IDs as team names, agent types as from-agent, unquoted messages truncating body. Changes: - scripts/msg.sh (new): simplified 2-arg sender (to, message) that auto-resolves team and from via identities + actas lock. Supports --from, --type, --channel flags. Validates TO membership and cross-resolves team when FROM spans multiple teams. - scripts/send.sh: argument validation — rejects non-existent teams, path/UUID/agent-type in from-agent, excess arguments (>4), and unregistered TO in direct mode. - templates/cmd.*.md + SKILL.md: switched all send instructions from send.sh to msg.sh. Marked send.sh direct use as error-prone. Reviewed-by: kindaichi (3 rounds, 6 findings fixed, all resolved) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…eplay check-inbox.sh previously used `read_at IS NULL` as the sole delivery filter, with no lower-bound on message id. When an agent running in turn mode (Stop hook / PostToolUse) is offline, unread messages accumulate in the database. On the next hook invocation all of them are delivered at once -- regardless of age -- causing agents to act on stale instructions from previous sessions. watch.sh (monitor mode) already avoids this with a per-session watermark. This commit brings equivalent behaviour to check-inbox.sh. Introduce a per-agent watermark file at: $SKILL_DIR/run/check-inbox.<agent>.watermark Behaviour: - First run: initialise watermark to COALESCE(MAX(id), 0) so that only messages arriving after this point are delivered. - Subsequent runs: read the stored watermark; filter queries with AND id > $WATERMARK in both SELECT and UPDATE. - Safety valve: if DB MAX(id) < stored watermark (e.g. after a DB wipe and recreation), reset watermark to MAX(id) to prevent permanent silence. - Multi-team correctness: snapshot LOOP_WM=$WATERMARK before the team loop. All teams query against the same LOOP_WM. Advance the watermark once after all teams are processed using the global max id across teams, preventing messages with lower ids in a later team from being skipped. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
FROM is now validated against team config in both direct and channel modes, not just direct. Unregistered senders are rejected with a member list. TO validation remains direct-mode only (channel mode uses resolve-channel-members.sh). Reviewed-by: kindaichi (2 rounds) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Thanks — and you've got the design right. Skipping the offline backlog on reconnect is the behavior we want for turn-mode, precisely because monitor mode ( One blocker before we can take it: the description says "No other files are modified," but the diff also renames the skill ( Could you resubmit with just the |
|
Resubmitted as #171 with only the |
Summary
Problem
check-inbox.sh filters unread messages with WHERE read_at IS NULL and no lower-bound on id. For agents using turn delivery mode (Antigravity / Gemini PostToolUse, Codex Stop hook), the hook fires only when the agent is active. Any messages sent while the agent was offline accumulate in the database with read_at IS NULL. When the agent next becomes active and the hook fires, all accumulated messages are delivered at once, no matter how old they are.
This means an agent can receive and act on instructions that were relevant to a session that ended hours or days ago, leading to unintended behaviour.
watch.sh (used in monitor mode) already avoids this by setting a watermark to MAX(id) at startup and only streaming messages with id > watermark. check-inbox.sh had no equivalent protection.
Solution
Add a per-agent watermark persisted at SKILL_DIR/run/check-inbox.agent.watermark:
No other files are modified. Existing cooldown, actas-lock, and monitor-deferral logic is unchanged.
Testing
Basic replay prevention
Multi-team correctness
Safety valve (DB recreation)