Skip to content

Kiloclaw scheduled action notifications#3038

Open
St0rmz1 wants to merge 14 commits intomainfrom
kiloclaw-scheduled-action-notifications
Open

Kiloclaw scheduled action notifications#3038
St0rmz1 wants to merge 14 commits intomainfrom
kiloclaw-scheduled-action-notifications

Conversation

@St0rmz1
Copy link
Copy Markdown
Contributor

@St0rmz1 St0rmz1 commented May 4, 2026

Summary

Add a notification framework for the scheduled action feature so admins can let users know ahead of time when an instance is going to restart or change version. Three channels:

  • email through the existing kilocode email scaffold
  • mobile push through the notifications service binding
  • workspace banner driven off getStatus.scheduledAction

Plumbing

A new kiloclaw_scheduled_action_notifications table holds one row per target, kind, and channel. kind is either notice (the heads up that fires before the action) or cancelled (the follow up after an admin cancels an action whose notice already went out). status walks pending, sending, then sent or failed.

A cron in the kiloclaw worker fires every minute. It recovers any rows left in sending from a crashed prior tick, selects rows whose lead window has opened (and whose parent target is still pending for notice rows), claims each row atomically with a CAS that revalidates parent state, dispatches the channel side effect, and marks the row sent or failed. A timeout bounds each email post so a slow upstream cannot stall the workers in a batch.

Email dispatch goes through a new internal route at /api/internal/kiloclaw/scheduled-action-side-effects so the email rendering and Mailgun send stays where it already lives. The authentication header is compared with HMAC then timingSafeEqual so the compare runs on fixed length buffers regardless of input.

Mobile push goes directly through the NOTIFICATIONS service binding. The mobile app notification parser was extended to recognize the new payload so the app suppresses the alert when the user is already in that chat, and routes to the chat when the user taps the notification.

The webapp banner is a no op dispatch. The row exists, the workspace renders the banner from getStatus.scheduledAction. The banner query joins the webapp notice row so the banner stays hidden when an admin opted out of notify or dropped webapp from the channel set, and the banner waits until the lead window opens.

Cancellation

When an admin cancels a scheduled action (whole or per instance), pending notice rows transition to failed with reason action cancelled before notice was dispatched, so the sweep will not deliver a stale notice. For channels that already received a sent notice, a matching cancelled row is queued so the user sees the follow up.

Admin UI

The Change Version dialog on the instance detail page and the Scheduler tab forms (Schedule Restart and Schedule Version Change) expose notify (on by default), lead hours, subject and body, and channel selection, so an admin can narrow or silence the notification per schedule. The Scheduler tab also adds a Run Notice Sweep Now button that drives the sweep on demand for verifying notice copy.

Storage

One migration adds the notifications table, a unique index on (target, kind, channel), and a partial index on target_id WHERE status = 'pending' for the sweep lookup.

Verification

  • Schedule a restart with notify on and a short lead window. Confirm the email arrives, the banner appears once the lead window opens, and the mobile push is recognized.
  • Schedule with notify off. Confirm no email, no banner, no push.
  • Schedule, then cancel before the lead window opens. Confirm pending notice rows go to failed and no notice is delivered.
  • Schedule, let the notice fire, then cancel. Confirm a cancelled row is queued for the channels that already received the notice and the user sees the cancellation.
  • Click Run Notice Sweep Now and confirm the Last Run line shows processed, sent, failed, and recovered counts.

Visual Changes

N/A

Reviewer Notes

  • The claim CAS revalidates parent target, action, and stage state in a single atomic UPDATE. This closes the window between selectDue and dispatch where the apply path could race ahead and a notice would otherwise fire for an action that already ran. cancelled rows skip the parent gate since they announce the cancellation regardless of where the parent ended up.
  • selectDue uses a left join on kilocode_users rather than an inner join, and dispatchOne short circuits with a clear error when the joined user record is missing. An inner join would silently drop the row and leave it pending forever. Today the GDPR flow only anonymizes the row, so this is defense against a future path that hard deletes a user.
  • Recovery threshold is five minutes. With concurrency of 10 and a dispatch timeout of 10 seconds per row, the worst case tick runs roughly 100 seconds, comfortably below the threshold.
  • Tests cover the sweep orchestrator (claim wins, claim races the apply path, cancelled still fires when parent gate would block, recovery counts, mark failures), the push helper, the email side effects route, and the cancel paths that void pending notices.

Comment thread apps/web/src/routers/kiloclaw-router.ts
Comment thread services/kiloclaw/src/scheduled/scheduled-action-notices.ts
Comment thread services/notifications/src/lib/scheduled-action-push.ts
@kilo-code-bot
Copy link
Copy Markdown
Contributor

kilo-code-bot Bot commented May 4, 2026

Code Review Summary

Status: No Issues Found | Recommendation: Merge

Files Reviewed (2 files)
  • services/kiloclaw/src/scheduled/scheduled-action-notices.ts
  • services/kiloclaw/src/scheduled/scheduled-action-notices.test.ts

Reviewed by gpt-5.5-2026-04-23 · 1,513,245 tokens

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant