Skip to content

feat(check-inbox): add watermark to prevent stale message re-delivery#171

Open
tayhiga-prog wants to merge 1 commit into
fujibee:mainfrom
tayhiga-prog:fix/check-inbox-watermark
Open

feat(check-inbox): add watermark to prevent stale message re-delivery#171
tayhiga-prog wants to merge 1 commit into
fujibee:mainfrom
tayhiga-prog:fix/check-inbox-watermark

Conversation

@tayhiga-prog

Copy link
Copy Markdown

Summary

  • Introduces a per-agent watermark file to track the highest delivered message id in check-inbox.sh
  • Prevents unread messages that accumulated during agent downtime from being replayed on reconnect
  • Aligns turn-mode delivery (check-inbox.sh) with monitor-mode delivery (watch.sh), which already uses a watermark

Problem

check-inbox.sh filters unread messages with WHERE read_at IS NULL and no lower-bound on id. For agents using turn delivery mode (Antigravity / Gemini PostToolUse, Codex Stop hook), the hook fires only when the agent is active. Any messages sent while the agent was offline accumulate in the database with read_at IS NULL. When the agent next becomes active and the hook fires, all accumulated messages are delivered at once, no matter how old they are.

This means an agent can receive and act on instructions that were relevant to a session that ended hours or days ago, leading to unintended behaviour.

watch.sh (used in monitor mode) already avoids this by setting a watermark to MAX(id) at startup and only streaming messages with id > watermark. check-inbox.sh had no equivalent protection.

Solution

Add a per-agent watermark persisted at SKILL_DIR/run/check-inbox.agent.watermark:

  • Initialisation: on first invocation (no watermark file), set watermark to COALESCE(MAX(id), 0). Only messages arriving after this point will ever be delivered.
  • Query filter: append AND id > LOOP_WM to both the SELECT and UPDATE statements, where LOOP_WM is the watermark snapshotted before the team loop begins.
  • Watermark advance: after all teams are processed, query MAX(id) WHERE to_agent = AGENT AND id > LOOP_WM across all teams and persist the result. Advancing once after the loop (rather than per-team) ensures agents belonging to multiple teams never skip messages whose ids fall between two teams maxima.
  • Safety valve: if the stored watermark exceeds the DBs current MAX(id) (DB wiped and recreated), reset to MAX(id) to prevent permanent message silence.

No other files are modified. Existing cooldown, actas-lock, and monitor-deferral logic is unchanged.

Testing

Basic replay prevention

  1. Join an agent to a team with turn delivery mode.
  2. While the agent is not running, send several messages to it via send.sh.
  3. Start the agent and trigger check-inbox.sh manually.
  4. Verify: the watermark file is created and no stale messages are delivered.
  5. Send a new message while the agent is active.
  6. Trigger check-inbox.sh again; verify the new message is delivered and the watermark advances.

Multi-team correctness

  1. Register the same agent in two teams.
  2. Send messages to both teams (both above the current watermark).
  3. Invoke check-inbox.sh; verify both messages are delivered in the same run.

Safety valve (DB recreation)

  1. Write a watermark value larger than the current DB MAX(id).
  2. Invoke check-inbox.sh; verify the watermark is reset and delivery resumes normally.

@fujibee fujibee left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tayhiga-prog

Nice change — aligning turn-mode check-inbox.sh with the watermark model watch.sh already uses is the right call, and the implementation is careful: integer validation on every read, the DB-shrink clamp, and the single LOOP_WM snapshot reused for the SELECT, the mark-read UPDATE, and the post-loop advance. One thing worth changing before merge, plus two notes.

1. Seed the watermark per-agent, not globally (can drop this agent's mail).
The initializer uses the global max:

WATERMARK="$(agmsg_sqlite "$DB" "SELECT COALESCE(MAX(id), 0) FROM messages;" ...)"

but watch.sh scopes its watermark to the subscription (WHERE $WHERE_PAIRS), and the post-loop advance here already scopes to WHERE to_agent='$AGENT'. With the global MAX(id), if another agent has posted newer messages, this agent's first check seeds its watermark above its own unread, and those messages are then skipped forever (the mark-read UPDATE is also gated on id > LOOP_WM, so they never clear either). Scoping the seed to WHERE to_agent='$AGENT' matches both watch.sh and your own advance step.

2. Windows: the watermark silently no-ops under Git Bash.
agmsg_sqlite (the on-disk store wrapper) doesn't strip the trailing CR that sqlite3.exe emits on Windows, so MAX(id) comes back as "<n>\r", the *[!0-9]* guard rejects it, and the watermark resets to 0 → every check delivers everything. Not a crash, just inert on Windows. A tr -d '\r' on the integer reads (or routing them through a CR-safe read) would let it engage there too. Context: #180 added CR-stripping for the in-memory path but not this store wrapper.

3. Behaviour note (doc-only).
After this lands, the first turn-mode check seeds the watermark and delivers nothing — intended, and consistent with monitor mode, but it's a change from "first check drains the backlog." Worth a line in the PR body / docs so a user who sends a message and then starts an agent isn't surprised by the empty first turn.

Happy to re-review once (1) is in. Thanks for tightening this up! 🙏

@fujibee

fujibee commented Jun 21, 2026

Copy link
Copy Markdown
Owner

@tayhiga-prog — CI's in now, and it surfaces exactly the seeding concern from point (1): the required bats (ubuntu/macos) legs fail on three check-inbox tests —

  • check-inbox (copilot): emits decision=block JSON when new messages arrive
  • check-inbox: still delivers when the lock is owned by this session
  • storage: stop-hook delivery works when the default db dir is absent but the override is populated

All three send a message and then expect the next check-inbox to deliver it. With the watermark seeded at MAX(id) on the first run, that just-sent message is id <= watermark and gets skipped — so the deliver-on-check contract those tests (and users) rely on is broken on the very first check.

This is the crux: watch.sh can seed at MAX because monitor mode is real-time and only cares about messages after it attached, but turn mode's first check legitimately needs to drain the currently-pending unread. A couple of directions:

  • Seed from already-read state rather than all messages — e.g. MAX(id) of rows where read_at IS NOT NULL for this agent (deliver current unread once, then watermark forward), or simply seed at 0 and let the first check drain + advance.
  • If the goal is specifically to drop stale mail, an age-based cutoff (created_at older than N) expresses that intent more directly than an id watermark.

Either way, scoping to to_agent='$AGENT' (point 1) still applies. Once the seeding is reworked the three tests should go green again. Glad to look at whichever direction you pick.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants