Skip to content

feat(check-inbox): add watermark to prevent stale message re-delivery#159

Closed
tayhiga-prog wants to merge 3 commits into
fujibee:mainfrom
tayhiga-prog:main
Closed

feat(check-inbox): add watermark to prevent stale message re-delivery#159
tayhiga-prog wants to merge 3 commits into
fujibee:mainfrom
tayhiga-prog:main

Conversation

@tayhiga-prog

Copy link
Copy Markdown

Summary

  • Introduces a per-agent watermark file to track the highest delivered message id in check-inbox.sh
  • Prevents unread messages that accumulated during agent downtime from being replayed on reconnect
  • Aligns turn-mode delivery (check-inbox.sh) with monitor-mode delivery (watch.sh), which already uses a watermark

Problem

check-inbox.sh filters unread messages with WHERE read_at IS NULL and no lower-bound on id. For agents using turn delivery mode (Antigravity / Gemini PostToolUse, Codex Stop hook), the hook fires only when the agent is active. Any messages sent while the agent was offline accumulate in the database with read_at IS NULL. When the agent next becomes active and the hook fires, all accumulated messages are delivered at once, no matter how old they are.

This means an agent can receive and act on instructions that were relevant to a session that ended hours or days ago, leading to unintended behaviour.

watch.sh (used in monitor mode) already avoids this by setting a watermark to MAX(id) at startup and only streaming messages with id > watermark. check-inbox.sh had no equivalent protection.

Solution

Add a per-agent watermark persisted at SKILL_DIR/run/check-inbox.agent.watermark:

  • Initialisation: on first invocation (no watermark file), set watermark to COALESCE(MAX(id), 0). Only messages arriving after this point will ever be delivered.
  • Query filter: append AND id > LOOP_WM to both the SELECT and UPDATE statements, where LOOP_WM is the watermark snapshotted before the team loop begins.
  • Watermark advance: after all teams are processed, query MAX(id) WHERE to_agent = AGENT AND id > LOOP_WM across all teams and persist the result. Advancing once after the loop (rather than per-team) ensures agents belonging to multiple teams never skip messages whose ids fall between two teams maxima.
  • Safety valve: if the stored watermark exceeds the DBs current MAX(id) (DB wiped and recreated), reset to MAX(id) to prevent permanent message silence.

No other files are modified. Existing cooldown, actas-lock, and monitor-deferral logic is unchanged.

Testing

Basic replay prevention

  1. Join an agent to a team with turn delivery mode.
  2. While the agent is not running, send several messages to it via send.sh.
  3. Start the agent and trigger check-inbox.sh manually.
  4. Verify: the watermark file is created and no stale messages are delivered.
  5. Send a new message while the agent is active.
  6. Trigger check-inbox.sh again; verify the new message is delivered and the watermark advances.

Multi-team correctness

  1. Register the same agent in two teams.
  2. Send messages to both teams (both above the current watermark).
  3. Invoke check-inbox.sh; verify both messages are delivered in the same run.

Safety valve (DB recreation)

  1. Write a watermark value larger than the current DB MAX(id).
  2. Invoke check-inbox.sh; verify the watermark is reset and delivery resumes normally.

比嘉崇哲 and others added 3 commits June 18, 2026 18:34
…ee#44)

Root cause: send.sh's 4-positional-arg interface (team, from, to, message)
caused frequent argument-order mistakes by LLM agents — session IDs as
team names, agent types as from-agent, unquoted messages truncating body.

Changes:
- scripts/msg.sh (new): simplified 2-arg sender (to, message) that
  auto-resolves team and from via identities + actas lock. Supports
  --from, --type, --channel flags. Validates TO membership and
  cross-resolves team when FROM spans multiple teams.
- scripts/send.sh: argument validation — rejects non-existent teams,
  path/UUID/agent-type in from-agent, excess arguments (>4), and
  unregistered TO in direct mode.
- templates/cmd.*.md + SKILL.md: switched all send instructions from
  send.sh to msg.sh. Marked send.sh direct use as error-prone.

Reviewed-by: kindaichi (3 rounds, 6 findings fixed, all resolved)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…eplay

check-inbox.sh previously used `read_at IS NULL` as the sole delivery
filter, with no lower-bound on message id. When an agent running in
turn mode (Stop hook / PostToolUse) is offline, unread messages
accumulate in the database. On the next hook invocation all of them
are delivered at once -- regardless of age -- causing agents to act
on stale instructions from previous sessions.

watch.sh (monitor mode) already avoids this with a per-session
watermark. This commit brings equivalent behaviour to check-inbox.sh.

Introduce a per-agent watermark file at:
  $SKILL_DIR/run/check-inbox.<agent>.watermark

Behaviour:
- First run: initialise watermark to COALESCE(MAX(id), 0) so that
  only messages arriving after this point are delivered.
- Subsequent runs: read the stored watermark; filter queries with
  AND id > $WATERMARK in both SELECT and UPDATE.
- Safety valve: if DB MAX(id) < stored watermark (e.g. after a DB
  wipe and recreation), reset watermark to MAX(id) to prevent
  permanent silence.
- Multi-team correctness: snapshot LOOP_WM=$WATERMARK before the
  team loop. All teams query against the same LOOP_WM. Advance the
  watermark once after all teams are processed using the global max
  id across teams, preventing messages with lower ids in a later
  team from being skipped.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
FROM is now validated against team config in both direct and channel
modes, not just direct. Unregistered senders are rejected with a
member list. TO validation remains direct-mode only (channel mode
uses resolve-channel-members.sh).

Reviewed-by: kindaichi (2 rounds)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@fujibee

fujibee commented Jun 20, 2026

Copy link
Copy Markdown
Owner

Thanks — and you've got the design right. Skipping the offline backlog on reconnect is the behavior we want for turn-mode, precisely because monitor mode (watch.sh) already does this (watermark = MAX(id) at startup). Aligning check-inbox.sh with that is the right consistency fix, and the watermark logic itself reads cleanly: per-agent persisted watermark, a single advance after the team loop (so multi-team agents don't skip ids between two maxima), and the DB-recreation safety valve are all sensible. Persisting the watermark to a file also avoids the reset-to-MAX(id)-on-restart pitfall (#107).

One blocker before we can take it: the description says "No other files are modified," but the diff also renames the skill (agmsgagmsgcrm), drops the Copilot template, adds a new msg.sh, and rewrites SKILL.md / send.sh / the cmd templates. Those look like they came along from a fork and are unrelated to the watermark change — we can't merge them.

Could you resubmit with just the scripts/check-inbox.sh change (matching your "no other files modified" note)? Once it's down to that, this is good to go. Thanks again.

@tayhiga-prog

Copy link
Copy Markdown
Author

Resubmitted as #171 with only the scripts/check-inbox.sh change, as requested. Closing this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants